Goal:
Predict the probability of an online credit card transaction being fraudulent, based on different properties of the transactions.
The goal of this section is to:
# Data Manipulation
import numpy as np
import pandas as pd
# Data Visualization
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
# Time
import time
import datetime
# Machine Learning
from sklearn.preprocessing import LabelEncoder, minmax_scale
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import confusion_matrix , classification_report, accuracy_score, roc_auc_score, plot_roc_curve, precision_recall_curve, plot_precision_recall_curve
from sklearn.calibration import calibration_curve
from sklearn.calibration import CalibratedClassifierCV
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from imblearn.over_sampling import RandomOverSampler
from scipy.stats import chi2_contingency, f_oneway
import gc
import warnings
from tqdm import tqdm
# Set Options
pd.set_option('display.max_rows', 800)
pd.set_option('display.max_columns', 500)
%matplotlib inline
warnings.filterwarnings("ignore")
Purpose is to:
The data is broken into two files identity and transaction, which are joined by “TransactionID”.
Note: Not all transactions have corresponding identity information.
Load the transaction and identity datasets using pd.read_csv()
%%time
# Load Data
df_id = pd.read_csv('Data/train_identity.csv')
df_tran = pd.read_csv('Data/train_transaction.csv')
Wall time: 21.3 s
# Identitiy Data
df_id.sample(6)
| TransactionID | id_01 | id_02 | id_03 | id_04 | id_05 | id_06 | id_07 | id_08 | id_09 | id_10 | id_11 | id_12 | id_13 | id_14 | id_15 | id_16 | id_17 | id_18 | id_19 | id_20 | id_21 | id_22 | id_23 | id_24 | id_25 | id_26 | id_27 | id_28 | id_29 | id_30 | id_31 | id_32 | id_33 | id_34 | id_35 | id_36 | id_37 | id_38 | DeviceType | DeviceInfo | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 601 | 2990046 | -5.0 | 821814.0 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | NaN | NaN | 100.0 | NotFound | 49.0 | -300.0 | New | NotFound | 102.0 | 15.0 | 410.0 | 360.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | New | NotFound | iOS 11.1.2 | mobile safari 11.0 | 32.0 | 2208x1242 | match_status:1 | T | F | F | F | mobile | iOS Device |
| 29589 | 3065170 | -10.0 | 103219.0 | 0.0 | 0.0 | 2.0 | -5.0 | NaN | NaN | 0.0 | 0.0 | 100.0 | Found | 52.0 | NaN | Found | Found | 166.0 | 13.0 | 216.0 | 214.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Found | Found | NaN | ie 11.0 for desktop | NaN | NaN | NaN | F | F | T | T | desktop | rv:11.0 |
| 97539 | 3342636 | -20.0 | 175191.0 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | NaN | NaN | 100.0 | NotFound | 33.0 | NaN | New | NotFound | 225.0 | NaN | 266.0 | 305.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | New | NotFound | NaN | samsung browser generic | NaN | NaN | NaN | F | F | T | F | mobile | SAMSUNG SM-G610M Build/NRD90M |
| 141317 | 3561291 | -5.0 | 135838.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | 100.0 | NotFound | 52.0 | -480.0 | Found | Found | 166.0 | NaN | 193.0 | 222.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Found | Found | Windows 10 | chrome 66.0 | 24.0 | 1920x1080 | match_status:2 | T | F | T | F | desktop | Windows |
| 65961 | 3161995 | -5.0 | 22457.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | 100.0 | NotFound | 49.0 | -360.0 | Found | Found | 166.0 | NaN | 193.0 | 333.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Found | Found | Mac OS X 10_13_2 | chrome 63.0 | 24.0 | 2560x1600 | match_status:2 | T | F | T | F | desktop | MacOS |
| 138982 | 3549519 | -10.0 | 417370.0 | NaN | NaN | 3.0 | -23.0 | NaN | NaN | NaN | NaN | 100.0 | NotFound | 27.0 | NaN | New | NotFound | 225.0 | 15.0 | 290.0 | 127.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | New | NotFound | NaN | mobile safari 11.0 | NaN | NaN | NaN | F | F | T | F | mobile | NaN |
Variables in this table are identity information – network connection information (IP, ISP, Proxy, etc) and digital signature (UA/browser/os/version, etc) associated with transactions. They're collected by Vesta’s fraud protection system and digital security partners. (The field names are masked and pairwise dictionary will not be provided for privacy protection and contract agreement)
Categorical Features:
# Transaction Data
df_tran.head()
| TransactionID | isFraud | TransactionDT | TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | dist2 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | V29 | V30 | V31 | V32 | V33 | V34 | V35 | V36 | V37 | V38 | V39 | V40 | V41 | V42 | V43 | V44 | V45 | V46 | V47 | V48 | V49 | V50 | V51 | V52 | V53 | V54 | V55 | V56 | V57 | V58 | V59 | V60 | V61 | V62 | V63 | V64 | V65 | V66 | V67 | V68 | V69 | V70 | V71 | V72 | V73 | V74 | V75 | V76 | V77 | V78 | V79 | V80 | V81 | V82 | V83 | V84 | V85 | V86 | V87 | V88 | V89 | V90 | V91 | V92 | V93 | V94 | V95 | V96 | V97 | V98 | V99 | V100 | V101 | V102 | V103 | V104 | V105 | V106 | V107 | V108 | V109 | V110 | V111 | V112 | V113 | V114 | V115 | V116 | V117 | V118 | V119 | V120 | V121 | V122 | V123 | V124 | V125 | V126 | V127 | V128 | V129 | V130 | V131 | V132 | V133 | V134 | V135 | V136 | V137 | V138 | V139 | V140 | V141 | V142 | V143 | V144 | V145 | V146 | V147 | V148 | V149 | V150 | V151 | V152 | V153 | V154 | V155 | V156 | V157 | V158 | V159 | V160 | V161 | V162 | V163 | V164 | V165 | V166 | V167 | V168 | V169 | V170 | V171 | V172 | V173 | V174 | V175 | V176 | V177 | V178 | V179 | V180 | V181 | V182 | V183 | V184 | V185 | V186 | V187 | V188 | V189 | V190 | V191 | V192 | V193 | V194 | V195 | V196 | V197 | V198 | V199 | V200 | V201 | V202 | V203 | V204 | V205 | V206 | V207 | V208 | V209 | V210 | V211 | V212 | V213 | V214 | V215 | V216 | V217 | V218 | V219 | V220 | V221 | V222 | V223 | V224 | V225 | V226 | V227 | V228 | V229 | V230 | V231 | V232 | V233 | V234 | V235 | V236 | V237 | V238 | V239 | V240 | V241 | V242 | V243 | V244 | V245 | V246 | V247 | V248 | V249 | V250 | V251 | V252 | V253 | V254 | V255 | V256 | V257 | V258 | V259 | V260 | V261 | V262 | V263 | V264 | V265 | V266 | V267 | V268 | V269 | V270 | V271 | V272 | V273 | V274 | V275 | V276 | V277 | V278 | V279 | V280 | V281 | V282 | V283 | V284 | V285 | V286 | V287 | V288 | V289 | V290 | V291 | V292 | V293 | V294 | V295 | V296 | V297 | V298 | V299 | V300 | V301 | V302 | V303 | V304 | V305 | V306 | V307 | V308 | V309 | V310 | V311 | V312 | V313 | V314 | V315 | V316 | V317 | V318 | V319 | V320 | V321 | V322 | V323 | V324 | V325 | V326 | V327 | V328 | V329 | V330 | V331 | V332 | V333 | V334 | V335 | V336 | V337 | V338 | V339 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2987000 | 0 | 86400 | 68.5 | W | 13926 | NaN | 150.0 | discover | 142.0 | credit | 315.0 | 87.0 | 19.0 | NaN | NaN | NaN | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | 1.0 | 1.0 | 14.0 | NaN | 13.0 | NaN | NaN | NaN | NaN | NaN | NaN | 13.0 | 13.0 | NaN | NaN | NaN | 0.0 | T | T | T | M2 | F | T | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2987001 | 0 | 86401 | 29.0 | W | 2755 | 404.0 | 150.0 | mastercard | 102.0 | credit | 325.0 | 87.0 | NaN | NaN | gmail.com | NaN | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | M0 | T | T | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2987002 | 0 | 86469 | 59.0 | W | 4663 | 490.0 | 150.0 | visa | 166.0 | debit | 330.0 | 87.0 | 287.0 | NaN | outlook.com | NaN | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 315.0 | NaN | NaN | NaN | 315.0 | T | T | T | M0 | F | F | F | F | F | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 2987003 | 0 | 86499 | 50.0 | W | 18132 | 567.0 | 150.0 | mastercard | 117.0 | debit | 476.0 | 87.0 | NaN | NaN | yahoo.com | NaN | 2.0 | 5.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 25.0 | 1.0 | 112.0 | 112.0 | 0.0 | 94.0 | 0.0 | NaN | NaN | NaN | NaN | 84.0 | NaN | NaN | NaN | NaN | 111.0 | NaN | NaN | NaN | M0 | T | F | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 48.0 | 28.0 | 0.0 | 10.0 | 4.0 | 1.0 | 38.0 | 24.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 50.0 | 1758.0 | 925.0 | 0.0 | 354.0 | 135.0 | 50.0 | 1404.0 | 790.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 28.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | 0.0 | 4.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 38.0 | 24.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 50.0 | 1758.0 | 925.0 | 0.0 | 354.0 | 0.0 | 135.0 | 0.0 | 0.0 | 0.0 | 50.0 | 1404.0 | 790.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2987004 | 0 | 86506 | 50.0 | H | 4497 | 514.0 | 150.0 | mastercard | 102.0 | credit | 420.0 | 87.0 | NaN | NaN | gmail.com | NaN | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 18.0 | 140.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1803.0 | 49.0 | 64.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 15557.990234 | 169690.796875 | 0.0 | 0.0 | 0.0 | 515.0 | 5155.0 | 2840.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
There are few features which has data that needs smaller memory size to hold it but the current data type is occupying more memory. Hence reducing memory usage by the data is very much necessary. This section gives a function to modify the data type of each feature.
df_id.memory_usage(deep=True).sum() / 1024**2
157.63398933410645
df_tran.memory_usage(deep=True).sum() / 1024**2
2100.701406478882
df_tran.dtypes
TransactionID int64 isFraud int64 TransactionDT int64 TransactionAmt float64 ProductCD object card1 int64 card2 float64 card3 float64 card4 object card5 float64 card6 object addr1 float64 addr2 float64 dist1 float64 dist2 float64 P_emaildomain object R_emaildomain object C1 float64 C2 float64 C3 float64 C4 float64 C5 float64 C6 float64 C7 float64 C8 float64 C9 float64 C10 float64 C11 float64 C12 float64 C13 float64 C14 float64 D1 float64 D2 float64 D3 float64 D4 float64 D5 float64 D6 float64 D7 float64 D8 float64 D9 float64 D10 float64 D11 float64 D12 float64 D13 float64 D14 float64 D15 float64 M1 object M2 object M3 object M4 object M5 object M6 object M7 object M8 object M9 object V1 float64 V2 float64 V3 float64 V4 float64 V5 float64 V6 float64 V7 float64 V8 float64 V9 float64 V10 float64 V11 float64 V12 float64 V13 float64 V14 float64 V15 float64 V16 float64 V17 float64 V18 float64 V19 float64 V20 float64 V21 float64 V22 float64 V23 float64 V24 float64 V25 float64 V26 float64 V27 float64 V28 float64 V29 float64 V30 float64 V31 float64 V32 float64 V33 float64 V34 float64 V35 float64 V36 float64 V37 float64 V38 float64 V39 float64 V40 float64 V41 float64 V42 float64 V43 float64 V44 float64 V45 float64 V46 float64 V47 float64 V48 float64 V49 float64 V50 float64 V51 float64 V52 float64 V53 float64 V54 float64 V55 float64 V56 float64 V57 float64 V58 float64 V59 float64 V60 float64 V61 float64 V62 float64 V63 float64 V64 float64 V65 float64 V66 float64 V67 float64 V68 float64 V69 float64 V70 float64 V71 float64 V72 float64 V73 float64 V74 float64 V75 float64 V76 float64 V77 float64 V78 float64 V79 float64 V80 float64 V81 float64 V82 float64 V83 float64 V84 float64 V85 float64 V86 float64 V87 float64 V88 float64 V89 float64 V90 float64 V91 float64 V92 float64 V93 float64 V94 float64 V95 float64 V96 float64 V97 float64 V98 float64 V99 float64 V100 float64 V101 float64 V102 float64 V103 float64 V104 float64 V105 float64 V106 float64 V107 float64 V108 float64 V109 float64 V110 float64 V111 float64 V112 float64 V113 float64 V114 float64 V115 float64 V116 float64 V117 float64 V118 float64 V119 float64 V120 float64 V121 float64 V122 float64 V123 float64 V124 float64 V125 float64 V126 float64 V127 float64 V128 float64 V129 float64 V130 float64 V131 float64 V132 float64 V133 float64 V134 float64 V135 float64 V136 float64 V137 float64 V138 float64 V139 float64 V140 float64 V141 float64 V142 float64 V143 float64 V144 float64 V145 float64 V146 float64 V147 float64 V148 float64 V149 float64 V150 float64 V151 float64 V152 float64 V153 float64 V154 float64 V155 float64 V156 float64 V157 float64 V158 float64 V159 float64 V160 float64 V161 float64 V162 float64 V163 float64 V164 float64 V165 float64 V166 float64 V167 float64 V168 float64 V169 float64 V170 float64 V171 float64 V172 float64 V173 float64 V174 float64 V175 float64 V176 float64 V177 float64 V178 float64 V179 float64 V180 float64 V181 float64 V182 float64 V183 float64 V184 float64 V185 float64 V186 float64 V187 float64 V188 float64 V189 float64 V190 float64 V191 float64 V192 float64 V193 float64 V194 float64 V195 float64 V196 float64 V197 float64 V198 float64 V199 float64 V200 float64 V201 float64 V202 float64 V203 float64 V204 float64 V205 float64 V206 float64 V207 float64 V208 float64 V209 float64 V210 float64 V211 float64 V212 float64 V213 float64 V214 float64 V215 float64 V216 float64 V217 float64 V218 float64 V219 float64 V220 float64 V221 float64 V222 float64 V223 float64 V224 float64 V225 float64 V226 float64 V227 float64 V228 float64 V229 float64 V230 float64 V231 float64 V232 float64 V233 float64 V234 float64 V235 float64 V236 float64 V237 float64 V238 float64 V239 float64 V240 float64 V241 float64 V242 float64 V243 float64 V244 float64 V245 float64 V246 float64 V247 float64 V248 float64 V249 float64 V250 float64 V251 float64 V252 float64 V253 float64 V254 float64 V255 float64 V256 float64 V257 float64 V258 float64 V259 float64 V260 float64 V261 float64 V262 float64 V263 float64 V264 float64 V265 float64 V266 float64 V267 float64 V268 float64 V269 float64 V270 float64 V271 float64 V272 float64 V273 float64 V274 float64 V275 float64 V276 float64 V277 float64 V278 float64 V279 float64 V280 float64 V281 float64 V282 float64 V283 float64 V284 float64 V285 float64 V286 float64 V287 float64 V288 float64 V289 float64 V290 float64 V291 float64 V292 float64 V293 float64 V294 float64 V295 float64 V296 float64 V297 float64 V298 float64 V299 float64 V300 float64 V301 float64 V302 float64 V303 float64 V304 float64 V305 float64 V306 float64 V307 float64 V308 float64 V309 float64 V310 float64 V311 float64 V312 float64 V313 float64 V314 float64 V315 float64 V316 float64 V317 float64 V318 float64 V319 float64 V320 float64 V321 float64 V322 float64 V323 float64 V324 float64 V325 float64 V326 float64 V327 float64 V328 float64 V329 float64 V330 float64 V331 float64 V332 float64 V333 float64 V334 float64 V335 float64 V336 float64 V337 float64 V338 float64 V339 float64 dtype: object
Certain features occupy more memory than what is needed to store them. Reducing the memory usage by changing data type will speed up the computations.
Let's create a function for that:
print('int64 min: ', np.iinfo(np.int64).min)
print('int64 max: ', np.iinfo(np.int64).max)
int64 min: -9223372036854775808 int64 max: 9223372036854775807
print('int8 min: ', np.iinfo(np.int8).min)
print('int8 max: ', np.iinfo(np.int8).max)
int8 min: -128 int8 max: 127
# Reduce memory usage
def reduce_mem_usage(df, verbose=True):
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
start_mem = df.memory_usage(deep=True).sum() / 1024**2
for col in df.columns:
col_type = df[col].dtypes
if col_type in numerics:
c_min = df[col].min()
c_max = df[col].max()
if str(col_type)[:3] == 'int':
if c_min >= np.iinfo(np.int8).min and c_max <= np.iinfo(np.int8).max:
df[col] = df[col].astype(np.int8)
elif c_min >= np.iinfo(np.int16).min and c_max <= np.iinfo(np.int16).max:
df[col] = df[col].astype(np.int16)
elif c_min >= np.iinfo(np.int32).min and c_max <= np.iinfo(np.int32).max:
df[col] = df[col].astype(np.int32)
elif c_min >= np.iinfo(np.int64).min and c_max <= np.iinfo(np.int64).max:
df[col] = df[col].astype(np.int64)
else:
if c_min >= np.finfo(np.float16).min and c_max <= np.finfo(np.float16).max:
df[col] = df[col].astype(np.float16)
elif c_min >= np.finfo(np.float32).min and c_max <= np.finfo(np.float32).max:
df[col] = df[col].astype(np.float32)
else:
df[col] = df[col].astype(np.float64)
end_mem = df.memory_usage(deep=True).sum() / 1024**2
if verbose: print('Mem. usage decreased to {:5.2f} Mb ({:.1f}% reduction)'.format(end_mem, 100 * (start_mem - end_mem) / start_mem))
return df
Use the defined function to reduce the memory usage
# Reduce the memory size of the dataframe
df_id = reduce_mem_usage(df_id)
df_tran = reduce_mem_usage(df_tran)
Mem. usage decreased to 138.38 Mb (12.2% reduction) Mem. usage decreased to 867.89 Mb (58.7% reduction)
Before attempting to solve the problem, it's very important to have a good understanding of data.
The goal of this section is to:
# Dimensions of identity dataset
print(df_id.shape)
(144233, 41)
The dataset has 144233 rows and 41 columns
# Dimensions of transaction dataset
print(df_tran.shape)
(590540, 394)
The dataset has 590540 rows and 394 columns
Check how many transactions has ID info
# How many had ID info?
df_tran.TransactionID.isin(df_id.TransactionID).sum()
144233
df_id.head()
| TransactionID | id_01 | id_02 | id_03 | id_04 | id_05 | id_06 | id_07 | id_08 | id_09 | id_10 | id_11 | id_12 | id_13 | id_14 | id_15 | id_16 | id_17 | id_18 | id_19 | id_20 | id_21 | id_22 | id_23 | id_24 | id_25 | id_26 | id_27 | id_28 | id_29 | id_30 | id_31 | id_32 | id_33 | id_34 | id_35 | id_36 | id_37 | id_38 | DeviceType | DeviceInfo | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2987004 | 0.0 | 70787.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 100.0 | NotFound | NaN | -480.0 | New | NotFound | 166.0 | NaN | 542.0 | 144.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | New | NotFound | Android 7.0 | samsung browser 6.2 | 32.0 | 2220x1080 | match_status:2 | T | F | T | T | mobile | SAMSUNG SM-G892A Build/NRD90M |
| 1 | 2987008 | -5.0 | 98945.0 | NaN | NaN | 0.0 | -5.0 | NaN | NaN | NaN | NaN | 100.0 | NotFound | 49.0 | -300.0 | New | NotFound | 166.0 | NaN | 621.0 | 500.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | New | NotFound | iOS 11.1.2 | mobile safari 11.0 | 32.0 | 1334x750 | match_status:1 | T | F | F | T | mobile | iOS Device |
| 2 | 2987010 | -5.0 | 191631.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | 100.0 | NotFound | 52.0 | NaN | Found | Found | 121.0 | NaN | 410.0 | 142.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Found | Found | NaN | chrome 62.0 | NaN | NaN | NaN | F | F | T | T | desktop | Windows |
| 3 | 2987011 | -5.0 | 221832.0 | NaN | NaN | 0.0 | -6.0 | NaN | NaN | NaN | NaN | 100.0 | NotFound | 52.0 | NaN | New | NotFound | 225.0 | NaN | 176.0 | 507.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | New | NotFound | NaN | chrome 62.0 | NaN | NaN | NaN | F | F | T | T | desktop | NaN |
| 4 | 2987016 | 0.0 | 7460.0 | 0.0 | 0.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | 100.0 | NotFound | NaN | -300.0 | Found | Found | 166.0 | 15.0 | 529.0 | 575.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Found | Found | Mac OS X 10_11_6 | chrome 62.0 | 24.0 | 1280x800 | match_status:2 | T | F | T | T | desktop | MacOS |
from pandas_summary import DataFrameSummary
df_id_summary = DataFrameSummary(df_id)
df_id_summary.summary()
| TransactionID | id_01 | id_02 | id_03 | id_04 | id_05 | id_06 | id_07 | id_08 | id_09 | id_10 | id_11 | id_12 | id_13 | id_14 | id_15 | id_16 | id_17 | id_18 | id_19 | id_20 | id_21 | id_22 | id_23 | id_24 | id_25 | id_26 | id_27 | id_28 | id_29 | id_30 | id_31 | id_32 | id_33 | id_34 | id_35 | id_36 | id_37 | id_38 | DeviceType | DeviceInfo | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 144233 | 144233 | 140872 | 66324 | 66324 | 136865 | 136865 | 5155 | 5155 | 74926 | 74926 | 140978 | NaN | 127320 | 80044 | NaN | NaN | 139369 | 45113 | 139318 | 139261 | 5159 | 5169 | NaN | 4747 | 5132 | 5163 | NaN | NaN | NaN | NaN | NaN | 77586 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| mean | 3.23633e+06 | NaN | 174717 | 0 | -0 | NaN | NaN | inf | -inf | 0 | -0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | inf | NaN | NaN | inf | inf | NaN | 12.7891 | inf | inf | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| std | 178850 | 0 | 159652 | 0 | 0 | 0 | 0 | 11.3828 | 26.0781 | 0 | 0 | 0 | NaN | 0 | NaN | NaN | NaN | 0 | 1.56152 | NaN | NaN | inf | 6.89844 | NaN | 2.37109 | 97.4375 | 32.0938 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| min | 2.987e+06 | -100 | 1 | -13 | -28 | -72 | -100 | -46 | -100 | -36 | -100 | 90 | NaN | 10 | -660 | NaN | NaN | 100 | 10 | 100 | 100 | 100 | 10 | NaN | 11 | 100 | 100 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 25% | 3.07714e+06 | -10 | 67992 | 0 | 0 | 0 | -6 | 5 | -48 | 0 | 0 | 100 | NaN | 49 | -360 | NaN | NaN | 166 | 13 | 266 | 256 | 252 | 14 | NaN | 11 | 321 | 119 | NaN | NaN | NaN | NaN | NaN | 24 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 50% | 3.19882e+06 | -5 | 125800 | 0 | 0 | 0 | 0 | 14 | -34 | 0 | 0 | 100 | NaN | 52 | -300 | NaN | NaN | 166 | 15 | 341 | 472 | 252 | 14 | NaN | 11 | 321 | 149 | NaN | NaN | NaN | NaN | NaN | 24 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 75% | 3.39292e+06 | -5 | 228749 | 0 | 0 | 1 | 0 | 22 | -23 | 0 | 0 | 100 | NaN | 52 | -300 | NaN | NaN | 225 | 15 | 427 | 533 | 486.5 | 14 | NaN | 15 | 371 | 169 | NaN | NaN | NaN | NaN | NaN | 32 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| max | 3.57753e+06 | 0 | 999595 | 10 | 0 | 52 | 0 | 61 | 0 | 25 | 0 | 100 | NaN | 64 | 720 | NaN | NaN | 229 | 29 | 671 | 661 | 854 | 44 | NaN | 26 | 548 | 216 | NaN | NaN | NaN | NaN | NaN | 32 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| counts | 144233 | 144233 | 140872 | 66324 | 66324 | 136865 | 136865 | 5155 | 5155 | 74926 | 74926 | 140978 | 144233 | 127320 | 80044 | 140985 | 129340 | 139369 | 45113 | 139318 | 139261 | 5159 | 5169 | 5169 | 4747 | 5132 | 5163 | 5169 | 140978 | 140978 | 77565 | 140282 | 77586 | 73289 | 77805 | 140985 | 140985 | 140985 | 140985 | 140810 | 118666 |
| uniques | 144233 | 77 | 115655 | 24 | 15 | 93 | 101 | 84 | 94 | 46 | 62 | 146 | 2 | 54 | 25 | 3 | 2 | 104 | 18 | 522 | 394 | 490 | 25 | 3 | 12 | 341 | 95 | 2 | 2 | 2 | 75 | 130 | 4 | 260 | 4 | 2 | 2 | 2 | 2 | 2 | 1786 |
| missing | 0 | 0 | 3361 | 77909 | 77909 | 7368 | 7368 | 139078 | 139078 | 69307 | 69307 | 3255 | 0 | 16913 | 64189 | 3248 | 14893 | 4864 | 99120 | 4915 | 4972 | 139074 | 139064 | 139064 | 139486 | 139101 | 139070 | 139064 | 3255 | 3255 | 66668 | 3951 | 66647 | 70944 | 66428 | 3248 | 3248 | 3248 | 3248 | 3423 | 25567 |
| missing_perc | 0% | 0% | 2.33% | 54.02% | 54.02% | 5.11% | 5.11% | 96.43% | 96.43% | 48.05% | 48.05% | 2.26% | 0% | 11.73% | 44.50% | 2.25% | 10.33% | 3.37% | 68.72% | 3.41% | 3.45% | 96.42% | 96.42% | 96.42% | 96.71% | 96.44% | 96.42% | 96.42% | 2.26% | 2.26% | 46.22% | 2.74% | 46.21% | 49.19% | 46.06% | 2.25% | 2.25% | 2.25% | 2.25% | 2.37% | 17.73% |
| types | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | bool | numeric | numeric | categorical | bool | numeric | numeric | numeric | numeric | numeric | numeric | categorical | numeric | numeric | numeric | bool | bool | bool | categorical | categorical | numeric | categorical | categorical | bool | bool | bool | bool | bool | categorical |
By looking at the summary of datasets, it's clear there is a lot of missing values in the dataset.
Let's get missing value stats and various other stats of columns in dataframe.
from pandas_summary import DataFrameSummary
df_tran_summary = DataFrameSummary(df_tran)
df_tran_summary.summary()
| TransactionID | isFraud | TransactionDT | TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | dist2 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | V29 | V30 | V31 | V32 | V33 | V34 | V35 | V36 | V37 | V38 | V39 | V40 | V41 | V42 | V43 | V44 | V45 | V46 | V47 | V48 | V49 | V50 | V51 | V52 | V53 | V54 | V55 | V56 | V57 | V58 | V59 | V60 | V61 | V62 | V63 | V64 | V65 | V66 | V67 | V68 | V69 | V70 | V71 | V72 | V73 | V74 | V75 | V76 | V77 | V78 | V79 | V80 | V81 | V82 | V83 | V84 | V85 | V86 | V87 | V88 | V89 | V90 | V91 | V92 | V93 | V94 | V95 | V96 | V97 | V98 | V99 | V100 | V101 | V102 | V103 | V104 | V105 | V106 | V107 | V108 | V109 | V110 | V111 | V112 | V113 | V114 | V115 | V116 | V117 | V118 | V119 | V120 | V121 | V122 | V123 | V124 | V125 | V126 | V127 | V128 | V129 | V130 | V131 | V132 | V133 | V134 | V135 | V136 | V137 | V138 | V139 | V140 | V141 | V142 | V143 | V144 | V145 | V146 | V147 | V148 | V149 | V150 | V151 | V152 | V153 | V154 | V155 | V156 | V157 | V158 | V159 | V160 | V161 | V162 | V163 | V164 | V165 | V166 | V167 | V168 | V169 | V170 | V171 | V172 | V173 | V174 | V175 | V176 | V177 | V178 | V179 | V180 | V181 | V182 | V183 | V184 | V185 | V186 | V187 | V188 | V189 | V190 | V191 | V192 | V193 | V194 | V195 | V196 | V197 | V198 | V199 | V200 | V201 | V202 | V203 | V204 | V205 | V206 | V207 | V208 | V209 | V210 | V211 | V212 | V213 | V214 | V215 | V216 | V217 | V218 | V219 | V220 | V221 | V222 | V223 | V224 | V225 | V226 | V227 | V228 | V229 | V230 | V231 | V232 | V233 | V234 | V235 | V236 | V237 | V238 | V239 | V240 | V241 | V242 | V243 | V244 | V245 | V246 | V247 | V248 | V249 | V250 | V251 | V252 | V253 | V254 | V255 | V256 | V257 | V258 | V259 | V260 | V261 | V262 | V263 | V264 | V265 | V266 | V267 | V268 | V269 | V270 | V271 | V272 | V273 | V274 | V275 | V276 | V277 | V278 | V279 | V280 | V281 | V282 | V283 | V284 | V285 | V286 | V287 | V288 | V289 | V290 | V291 | V292 | V293 | V294 | V295 | V296 | V297 | V298 | V299 | V300 | V301 | V302 | V303 | V304 | V305 | V306 | V307 | V308 | V309 | V310 | V311 | V312 | V313 | V314 | V315 | V316 | V317 | V318 | V319 | V320 | V321 | V322 | V323 | V324 | V325 | V326 | V327 | V328 | V329 | V330 | V331 | V332 | V333 | V334 | V335 | V336 | V337 | V338 | V339 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 590540 | 590540 | 590540 | 590540 | NaN | 590540 | 581607 | 588975 | NaN | 586281 | NaN | 524834 | 524834 | 238269 | 37627 | NaN | NaN | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 589271 | 309743 | 327662 | 421618 | 280699 | 73187 | 38917 | 74926 | 74926 | 514518 | 311253 | 64717 | 61952 | 62187 | 501427 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 81945 | 81945 | 81945 | 81945 | 81945 | 81951 | 81951 | 81951 | 81945 | 81945 | 81945 | 81945 | 81951 | 81951 | 81951 | 81945 | 81945 | 81945 | 81945 | 81945 | 81945 | 81951 | 81951 | 81945 | 81945 | 81945 | 81951 | 81951 | 81951 | 139631 | 139631 | 139819 | 139819 | 139819 | 139631 | 139631 | 139819 | 139819 | 139631 | 139631 | 139631 | 139631 | 139819 | 139631 | 139631 | 139631 | 139819 | 139819 | 139631 | 139631 | 139819 | 139819 | 139631 | 139631 | 139631 | 139631 | 139819 | 139819 | 139631 | 139819 | 139819 | 139631 | 139819 | 139819 | 139631 | 139631 | 139631 | 139631 | 139631 | 139631 | 139819 | 139819 | 139819 | 139631 | 139631 | 139631 | 139631 | 139631 | 139631 | 130430 | 130430 | 130430 | 141416 | 141416 | 141416 | 130430 | 130430 | 130430 | 130430 | 141416 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 141416 | 130430 | 130430 | 130430 | 141416 | 141416 | 130430 | 130430 | 130430 | 130430 | 130430 | 141416 | 130430 | 130430 | 130430 | 130430 | 141416 | 141416 | 130430 | 130430 | 130430 | 141416 | 141416 | 130430 | 130430 | 141416 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 141416 | 141416 | 141416 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 590528 | 590528 | 589271 | 589271 | 589271 | 590528 | 590528 | 590528 | 590528 | 589271 | 589271 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 589271 | 590528 | 590528 | 590528 | 589271 | 589271 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 589271 | 589271 | 589271 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 |
| mean | 3.28227e+06 | 0.03499 | 7.37231e+06 | NaN | NaN | 9898.73 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | inf | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | inf | NaN | 0 | NaN | NaN | inf | inf | inf | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 129.979 | 336.612 | 204.094 | NaN | NaN | NaN | 103.513 | 204.889 | 145.972 | 17.2501 | 38.8212 | 26.3651 | 0 | NaN | NaN | 0 | 0 | NaN | NaN | NaN | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 47453.2 | NaN | NaN | NaN | 877.889 | 2239.91 | 359.469 | NaN | NaN | 0 | NaN | NaN | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 444.147 | 1078.33 | 686.957 | NaN | NaN | NaN | NaN | NaN | NaN | 385.137 | 765.988 | 536.303 | 38.4375 | 133.208 | 71.1071 | NaN | NaN | NaN | 0 | NaN | NaN | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 117.391 | 201.658 | 153.521 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 107.152 | NaN | 31.7973 | 51.9566 | 42.3282 | NaN | NaN | 0 | NaN | NaN | 0 | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | 0 | 0 | NaN | NaN | NaN | NaN | 139.749 | 408.682 | 230.413 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 109.819 | 247.607 | 162.153 | 18.3725 | 42.0731 | 28.3266 | NaN | NaN | NaN | 0 | NaN | 0 | 0 | NaN | 0 | 721.742 | 1375.78 | 1014.62 | NaN | NaN | NaN | 55.3524 | 151.161 | 100.701 |
| std | 170474 | 0.183755 | 4.61722e+06 | NaN | NaN | 4901.17 | NaN | 0 | NaN | 0 | NaN | NaN | 0 | NaN | inf | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | inf | NaN | 0 | NaN | NaN | inf | inf | inf | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2346.95 | 4238.67 | 3010.26 | NaN | NaN | NaN | 2266.11 | 3796.32 | 2772.99 | 293.848 | 451.808 | 348.333 | 0 | 0 | 0 | 0 | 0 | NaN | 0 | NaN | 0 | 0 | 0 | 0 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | 142076 | NaN | NaN | NaN | 6049.17 | 8223.26 | 1244.46 | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4683.83 | 9105.61 | 6048.98 | NaN | NaN | NaN | NaN | NaN | NaN | 4541.84 | 7496.12 | 5471.66 | 571.834 | 1040.45 | 680.268 | NaN | NaN | NaN | 0 | NaN | NaN | 0 | 0 | 0 | 0 | NaN | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | 0 | 0 | 0 | 1294.85 | 2284.83 | 1605.51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1258.73 | NaN | 615.66 | 732.145 | 660.612 | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2348.85 | 4391.99 | 3021.92 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2270.03 | 3980.04 | 2793.34 | 332.305 | 473.499 | 382.053 | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 6217.22 | 11169.3 | 7955.74 | NaN | NaN | NaN | 668.487 | 1095.03 | 814.947 |
| min | 2.987e+06 | 0 | 86400 | 0.250977 | NaN | 1000 | 100 | 100 | NaN | 100 | NaN | 100 | 10 | 0 | 0 | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -122 | 0 | -83 | 0 | 0 | 0 | 0 | -53 | -83 | 0 | -193 | -83 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 25% | 3.13463e+06 | 0 | 3.02706e+06 | 43.3125 | NaN | 6019 | 214 | 150 | NaN | 166 | NaN | 204 | 87 | 3 | 7 | NaN | NaN | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 26 | 1 | 0 | 1 | 0 | 0 | 0.958496 | 0.208374 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 50% | 3.28227e+06 | 0 | 7.30653e+06 | 68.75 | NaN | 9678 | 361 | 150 | NaN | 226 | NaN | 299 | 87 | 8 | 37 | NaN | NaN | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 3 | 1 | 3 | 97 | 8 | 26 | 10 | 0 | 0 | 37.875 | 0.666504 | 15 | 43 | 0 | 0 | 0 | 52 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 75% | 3.4299e+06 | 0 | 1.12466e+07 | 125 | NaN | 14184 | 512 | 150 | NaN | 226 | NaN | 330 | 87 | 24 | 206 | NaN | NaN | 3 | 3 | 0 | 0 | 1 | 2 | 0 | 0 | 2 | 0 | 2 | 0 | 12 | 2 | 122 | 276 | 27 | 253 | 32 | 40 | 17 | 188 | 0.833496 | 197 | 274 | 13 | 0 | 2 | 314 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 107.95 | 0 | 0 | 59 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 30.9244 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 33.5935 | 20.8975 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 151.381 | 35.97 | 0 | 107.938 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| max | 3.57754e+06 | 1 | 1.58111e+07 | 31936 | NaN | 18396 | 600 | 231 | NaN | 237 | NaN | 540 | 102 | 10288 | 11624 | NaN | NaN | 4684 | 5692 | 26 | 2252 | 349 | 2252 | 2256 | 3332 | 210 | 3256 | 3188 | 3188 | 2918 | 1429 | 640 | 640 | 819 | 869 | 819 | 873 | 843 | 1708 | 0.958496 | 876 | 670 | 648 | 847 | 878 | 879 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | 8 | 9 | 6 | 6 | 9 | 9 | 8 | 8 | 4 | 5 | 3 | 6 | 1 | 7 | 15 | 15 | 15 | 7 | 15 | 5 | 8 | 13 | 13 | 7 | 13 | 4 | 4 | 5 | 9 | 7 | 15 | 7 | 13 | 3 | 5 | 54 | 54 | 15 | 24 | 1 | 8 | 8 | 48 | 48 | 6 | 12 | 5 | 5 | 5 | 6 | 12 | 5 | 6 | 17 | 51 | 6 | 10 | 16 | 16 | 6 | 10 | 7 | 7 | 1 | 7 | 8 | 2 | 5 | 6 | 6 | 10 | 7 | 8 | 4 | 6 | 30 | 31 | 7 | 19 | 19 | 7 | 7 | 7 | 7 | 30 | 30 | 1 | 2 | 5 | 6 | 7 | 7 | 2 | 880 | 1410 | 976 | 12 | 88 | 28 | 869 | 1285 | 928 | 15 | 99 | 55 | 1 | 7 | 7 | 7 | 9 | 9 | 9 | 6 | 6 | 6 | 3 | 3 | 3 | 3 | 3 | 3 | 13 | 13 | 13 | 160000 | 160000 | 160000 | 55136 | 55136 | 55136 | 93736 | 133915 | 98476 | 90750 | 90750 | 90750 | 22 | 33 | 33 | 5 | 9 | 869 | 62 | 297 | 24 | 26 | 20 | 20 | 3388 | 57 | 69 | 18 | 18 | 24 | 24 | 24 | 24 | 55136 | 641511 | 3300 | 3300 | 3300 | 93736 | 98476 | 104060 | 872 | 964 | 19 | 48 | 61 | 31 | 7 | 8 | 14 | 48 | 861 | 1235 | 920 | 83 | 24 | 83 | 41 | 16 | 31 | 38 | 218 | 30 | 30 | 42 | 21 | 44 | 37 | 7 | 16 | 38 | 14 | 21 | 45 | 45 | 55 | 104060 | 139777 | 104060 | 55136 | 55136 | 55136 | 3300 | 8048 | 3300 | 92888 | 129006 | 97628 | 104060 | 104060 | 104060 | 303 | 400 | 378 | 25 | 384 | 384 | 16 | 144 | 51 | 242 | 360 | 54 | 176 | 65 | 293 | 337 | 332 | 121 | 23 | 45 | 39 | 23 | 23 | 7 | 5 | 20 | 57 | 22 | 262 | 45 | 18 | 36 | 22 | 18 | 18 | 24 | 163 | 60 | 87 | 87 | 48 | 66 | 285 | 8 | 49 | 20 | 153600 | 153600 | 153600 | 55136 | 55136 | 55136 | 55136 | 4000 | 4000 | 4000 | 51200 | 66000 | 51200 | 104060 | 104060 | 104060 | 880 | 975 | 22 | 32 | 68 | 12 | 95 | 8 | 31 | 10 | 12 | 67 | 1055 | 323 | 869 | 1286 | 928 | 93 | 12 | 93 | 49 | 11 | 13 | 16 | 20 | 16 | 2 | 108800 | 145765 | 108800 | 55136 | 55136 | 55136 | 55136 | 4816 | 7520 | 4816 | 93736 | 134021 | 98476 | 104060 | 104060 | 104060 | 880 | 1411 | 976 | 12 | 44 | 18 | 15 | 99 | 55 | 160000 | 160000 | 160000 | 55136 | 55136 | 55136 | 104060 | 104060 | 104060 |
| counts | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 581607 | 588975 | 588963 | 586281 | 588969 | 524834 | 524834 | 238269 | 37627 | 496084 | 137291 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 590540 | 589271 | 309743 | 327662 | 421618 | 280699 | 73187 | 38917 | 74926 | 74926 | 514518 | 311253 | 64717 | 61952 | 62187 | 501427 | 319440 | 319440 | 319440 | 309096 | 240058 | 421180 | 244275 | 244288 | 244288 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 311253 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 514467 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 421571 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 513444 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 501376 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 590226 | 81945 | 81945 | 81945 | 81945 | 81945 | 81951 | 81951 | 81951 | 81945 | 81945 | 81945 | 81945 | 81951 | 81951 | 81951 | 81945 | 81945 | 81945 | 81945 | 81945 | 81945 | 81951 | 81951 | 81945 | 81945 | 81945 | 81951 | 81951 | 81951 | 139631 | 139631 | 139819 | 139819 | 139819 | 139631 | 139631 | 139819 | 139819 | 139631 | 139631 | 139631 | 139631 | 139819 | 139631 | 139631 | 139631 | 139819 | 139819 | 139631 | 139631 | 139819 | 139819 | 139631 | 139631 | 139631 | 139631 | 139819 | 139819 | 139631 | 139819 | 139819 | 139631 | 139819 | 139819 | 139631 | 139631 | 139631 | 139631 | 139631 | 139631 | 139819 | 139819 | 139819 | 139631 | 139631 | 139631 | 139631 | 139631 | 139631 | 130430 | 130430 | 130430 | 141416 | 141416 | 141416 | 130430 | 130430 | 130430 | 130430 | 141416 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 141416 | 130430 | 130430 | 130430 | 141416 | 141416 | 130430 | 130430 | 130430 | 130430 | 130430 | 141416 | 130430 | 130430 | 130430 | 130430 | 141416 | 141416 | 130430 | 130430 | 130430 | 141416 | 141416 | 130430 | 130430 | 141416 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 141416 | 141416 | 141416 | 130430 | 130430 | 130430 | 130430 | 130430 | 130430 | 590528 | 590528 | 589271 | 589271 | 589271 | 590528 | 590528 | 590528 | 590528 | 589271 | 589271 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 589271 | 590528 | 590528 | 590528 | 589271 | 589271 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 589271 | 589271 | 589271 | 590528 | 590528 | 590528 | 590528 | 590528 | 590528 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 | 82351 |
| uniques | 590540 | 2 | 573349 | 8195 | 5 | 13553 | 500 | 114 | 4 | 119 | 4 | 332 | 74 | 2412 | 1699 | 59 | 60 | 1495 | 1167 | 27 | 1223 | 319 | 1291 | 1069 | 1130 | 205 | 1122 | 1343 | 1066 | 1464 | 1108 | 641 | 641 | 649 | 808 | 688 | 829 | 597 | 5367 | 24 | 818 | 676 | 635 | 577 | 802 | 859 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 9 | 10 | 7 | 7 | 10 | 10 | 9 | 9 | 5 | 6 | 4 | 7 | 2 | 8 | 15 | 16 | 16 | 8 | 15 | 6 | 9 | 14 | 14 | 7 | 13 | 4 | 4 | 6 | 8 | 8 | 15 | 7 | 13 | 4 | 6 | 55 | 55 | 16 | 18 | 2 | 9 | 9 | 49 | 49 | 7 | 9 | 6 | 6 | 6 | 7 | 9 | 6 | 7 | 18 | 52 | 7 | 11 | 17 | 17 | 7 | 11 | 8 | 8 | 2 | 8 | 9 | 3 | 6 | 7 | 7 | 11 | 8 | 9 | 5 | 7 | 31 | 32 | 8 | 20 | 20 | 8 | 8 | 8 | 8 | 31 | 31 | 2 | 3 | 6 | 7 | 8 | 8 | 3 | 881 | 1410 | 976 | 13 | 89 | 29 | 870 | 1285 | 928 | 16 | 100 | 56 | 2 | 8 | 8 | 8 | 10 | 10 | 10 | 7 | 7 | 7 | 4 | 4 | 4 | 4 | 4 | 4 | 14 | 14 | 14 | 10299 | 24414 | 14507 | 1608 | 5511 | 3097 | 6560 | 9949 | 8178 | 3724 | 4852 | 4252 | 23 | 34 | 34 | 6 | 10 | 870 | 63 | 260 | 25 | 27 | 21 | 21 | 1344 | 56 | 39 | 19 | 19 | 25 | 25 | 25 | 25 | 2492 | 9621 | 79 | 185 | 106 | 1978 | 2547 | 987 | 873 | 965 | 20 | 49 | 62 | 32 | 8 | 9 | 15 | 49 | 862 | 1236 | 921 | 84 | 25 | 84 | 42 | 17 | 32 | 39 | 215 | 31 | 31 | 43 | 22 | 45 | 38 | 8 | 17 | 39 | 15 | 22 | 46 | 46 | 56 | 10970 | 14951 | 12858 | 1953 | 1581 | 2705 | 2093 | 2674 | 2262 | 7624 | 8868 | 8317 | 2282 | 2747 | 2532 | 304 | 401 | 379 | 26 | 77 | 76 | 17 | 79 | 35 | 81 | 50 | 55 | 91 | 66 | 294 | 338 | 333 | 122 | 24 | 46 | 40 | 24 | 24 | 6 | 5 | 21 | 43 | 23 | 58 | 46 | 19 | 23 | 23 | 19 | 19 | 25 | 66 | 45 | 46 | 48 | 49 | 67 | 68 | 9 | 41 | 21 | 10422 | 13358 | 11757 | 1871 | 2884 | 2286 | 151 | 1972 | 2286 | 2082 | 4689 | 8315 | 4965 | 2263 | 2540 | 2398 | 881 | 975 | 23 | 33 | 62 | 13 | 96 | 9 | 32 | 11 | 13 | 58 | 219 | 173 | 870 | 1286 | 928 | 94 | 13 | 94 | 50 | 12 | 14 | 17 | 21 | 17 | 2 | 16210 | 37367 | 23064 | 3239 | 7759 | 2526 | 5143 | 3915 | 5974 | 4540 | 9814 | 15184 | 12309 | 4799 | 6439 | 5560 | 881 | 1411 | 976 | 13 | 45 | 19 | 16 | 100 | 56 | 1758 | 2453 | 1971 | 143 | 669 | 355 | 254 | 380 | 334 |
| missing | 0 | 0 | 0 | 0 | 0 | 0 | 8933 | 1565 | 1577 | 4259 | 1571 | 65706 | 65706 | 352271 | 552913 | 94456 | 453249 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1269 | 280797 | 262878 | 168922 | 309841 | 517353 | 551623 | 515614 | 515614 | 76022 | 279287 | 525823 | 528588 | 528353 | 89113 | 271100 | 271100 | 271100 | 281444 | 350482 | 169360 | 346265 | 346252 | 346252 | 279287 | 279287 | 279287 | 279287 | 279287 | 279287 | 279287 | 279287 | 279287 | 279287 | 279287 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 76073 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 168969 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 77096 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 89164 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 314 | 508595 | 508595 | 508595 | 508595 | 508595 | 508589 | 508589 | 508589 | 508595 | 508595 | 508595 | 508595 | 508589 | 508589 | 508589 | 508595 | 508595 | 508595 | 508595 | 508595 | 508595 | 508589 | 508589 | 508595 | 508595 | 508595 | 508589 | 508589 | 508589 | 450909 | 450909 | 450721 | 450721 | 450721 | 450909 | 450909 | 450721 | 450721 | 450909 | 450909 | 450909 | 450909 | 450721 | 450909 | 450909 | 450909 | 450721 | 450721 | 450909 | 450909 | 450721 | 450721 | 450909 | 450909 | 450909 | 450909 | 450721 | 450721 | 450909 | 450721 | 450721 | 450909 | 450721 | 450721 | 450909 | 450909 | 450909 | 450909 | 450909 | 450909 | 450721 | 450721 | 450721 | 450909 | 450909 | 450909 | 450909 | 450909 | 450909 | 460110 | 460110 | 460110 | 449124 | 449124 | 449124 | 460110 | 460110 | 460110 | 460110 | 449124 | 460110 | 460110 | 460110 | 460110 | 460110 | 460110 | 449124 | 460110 | 460110 | 460110 | 449124 | 449124 | 460110 | 460110 | 460110 | 460110 | 460110 | 449124 | 460110 | 460110 | 460110 | 460110 | 449124 | 449124 | 460110 | 460110 | 460110 | 449124 | 449124 | 460110 | 460110 | 449124 | 460110 | 460110 | 460110 | 460110 | 460110 | 460110 | 460110 | 460110 | 460110 | 460110 | 449124 | 449124 | 449124 | 460110 | 460110 | 460110 | 460110 | 460110 | 460110 | 12 | 12 | 1269 | 1269 | 1269 | 12 | 12 | 12 | 12 | 1269 | 1269 | 12 | 12 | 12 | 12 | 12 | 12 | 1269 | 12 | 12 | 12 | 1269 | 1269 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 1269 | 1269 | 1269 | 12 | 12 | 12 | 12 | 12 | 12 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 | 508189 |
| missing_perc | 0% | 0% | 0% | 0% | 0% | 0% | 1.51% | 0.27% | 0.27% | 0.72% | 0.27% | 11.13% | 11.13% | 59.65% | 93.63% | 15.99% | 76.75% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0.21% | 47.55% | 44.51% | 28.60% | 52.47% | 87.61% | 93.41% | 87.31% | 87.31% | 12.87% | 47.29% | 89.04% | 89.51% | 89.47% | 15.09% | 45.91% | 45.91% | 45.91% | 47.66% | 59.35% | 28.68% | 58.64% | 58.63% | 58.63% | 47.29% | 47.29% | 47.29% | 47.29% | 47.29% | 47.29% | 47.29% | 47.29% | 47.29% | 47.29% | 47.29% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 12.88% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 28.61% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 13.06% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 15.10% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 0.05% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 86.12% | 76.36% | 76.36% | 76.32% | 76.32% | 76.32% | 76.36% | 76.36% | 76.32% | 76.32% | 76.36% | 76.36% | 76.36% | 76.36% | 76.32% | 76.36% | 76.36% | 76.36% | 76.32% | 76.32% | 76.36% | 76.36% | 76.32% | 76.32% | 76.36% | 76.36% | 76.36% | 76.36% | 76.32% | 76.32% | 76.36% | 76.32% | 76.32% | 76.36% | 76.32% | 76.32% | 76.36% | 76.36% | 76.36% | 76.36% | 76.36% | 76.36% | 76.32% | 76.32% | 76.32% | 76.36% | 76.36% | 76.36% | 76.36% | 76.36% | 76.36% | 77.91% | 77.91% | 77.91% | 76.05% | 76.05% | 76.05% | 77.91% | 77.91% | 77.91% | 77.91% | 76.05% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 76.05% | 77.91% | 77.91% | 77.91% | 76.05% | 76.05% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 76.05% | 77.91% | 77.91% | 77.91% | 77.91% | 76.05% | 76.05% | 77.91% | 77.91% | 77.91% | 76.05% | 76.05% | 77.91% | 77.91% | 76.05% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 76.05% | 76.05% | 76.05% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 77.91% | 0.00% | 0.00% | 0.21% | 0.21% | 0.21% | 0.00% | 0.00% | 0.00% | 0.00% | 0.21% | 0.21% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.21% | 0.00% | 0.00% | 0.00% | 0.21% | 0.21% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.21% | 0.21% | 0.21% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% | 86.05% |
| types | numeric | bool | numeric | numeric | categorical | numeric | numeric | numeric | categorical | numeric | categorical | numeric | numeric | numeric | numeric | categorical | categorical | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | bool | bool | bool | categorical | bool | bool | bool | bool | bool | bool | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | bool | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | bool | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | bool | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | bool | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | bool | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | bool | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric |
Check class imbalance
df_tran.loc[:, 'isFraud'].value_counts()
0 569877 1 20663 Name: isFraud, dtype: int64
Lot of interesting things can be observed here:
# Merge transaction dataset and identity dataset
df = df_tran.merge(df_id, how='left', left_index=True, right_index=True, on='TransactionID')
del df_tran, df_id
gc.collect()
0
Get dimensions of training dataset
# Dimentions of data
df.shape
(590540, 434)
Since left join was performed on transaction dataset, number of rows are same as transaction dataset.
# Add flag column for missing values
for col in df.columns:
df[col+"_missing_flag"] = df[col].isnull()
df.head()
| TransactionID | isFraud | TransactionDT | TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | dist2 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | V29 | V30 | V31 | V32 | V33 | V34 | V35 | V36 | V37 | V38 | V39 | V40 | V41 | V42 | V43 | V44 | V45 | V46 | V47 | V48 | V49 | V50 | V51 | V52 | V53 | V54 | V55 | V56 | V57 | V58 | V59 | V60 | V61 | V62 | V63 | V64 | V65 | V66 | V67 | V68 | V69 | V70 | V71 | V72 | V73 | V74 | V75 | V76 | V77 | V78 | V79 | V80 | V81 | V82 | V83 | V84 | V85 | V86 | V87 | V88 | V89 | V90 | V91 | V92 | V93 | V94 | V95 | V96 | V97 | V98 | V99 | V100 | V101 | V102 | V103 | V104 | V105 | V106 | V107 | V108 | V109 | V110 | V111 | V112 | V113 | V114 | V115 | V116 | V117 | V118 | V119 | V120 | V121 | V122 | V123 | V124 | V125 | V126 | V127 | V128 | V129 | V130 | V131 | V132 | V133 | V134 | V135 | V136 | V137 | V138 | V139 | V140 | V141 | V142 | V143 | V144 | V145 | V146 | V147 | V148 | V149 | V150 | V151 | V152 | V153 | V154 | V155 | V156 | V157 | V158 | V159 | V160 | V161 | V162 | V163 | V164 | V165 | V166 | V167 | V168 | V169 | V170 | V171 | V172 | V173 | V174 | V175 | V176 | V177 | V178 | V179 | V180 | V181 | V182 | V183 | V184 | V185 | V186 | V187 | V188 | V189 | V190 | V191 | V192 | V193 | V194 | V195 | ... | V130_missing_flag | V131_missing_flag | V132_missing_flag | V133_missing_flag | V134_missing_flag | V135_missing_flag | V136_missing_flag | V137_missing_flag | V138_missing_flag | V139_missing_flag | V140_missing_flag | V141_missing_flag | V142_missing_flag | V143_missing_flag | V144_missing_flag | V145_missing_flag | V146_missing_flag | V147_missing_flag | V148_missing_flag | V149_missing_flag | V150_missing_flag | V151_missing_flag | V152_missing_flag | V153_missing_flag | V154_missing_flag | V155_missing_flag | V156_missing_flag | V157_missing_flag | V158_missing_flag | V159_missing_flag | V160_missing_flag | V161_missing_flag | V162_missing_flag | V163_missing_flag | V164_missing_flag | V165_missing_flag | V166_missing_flag | V167_missing_flag | V168_missing_flag | V169_missing_flag | V170_missing_flag | V171_missing_flag | V172_missing_flag | V173_missing_flag | V174_missing_flag | V175_missing_flag | V176_missing_flag | V177_missing_flag | V178_missing_flag | V179_missing_flag | V180_missing_flag | V181_missing_flag | V182_missing_flag | V183_missing_flag | V184_missing_flag | V185_missing_flag | V186_missing_flag | V187_missing_flag | V188_missing_flag | V189_missing_flag | V190_missing_flag | V191_missing_flag | V192_missing_flag | V193_missing_flag | V194_missing_flag | V195_missing_flag | V196_missing_flag | V197_missing_flag | V198_missing_flag | V199_missing_flag | V200_missing_flag | V201_missing_flag | V202_missing_flag | V203_missing_flag | V204_missing_flag | V205_missing_flag | V206_missing_flag | V207_missing_flag | V208_missing_flag | V209_missing_flag | V210_missing_flag | V211_missing_flag | V212_missing_flag | V213_missing_flag | V214_missing_flag | V215_missing_flag | V216_missing_flag | V217_missing_flag | V218_missing_flag | V219_missing_flag | V220_missing_flag | V221_missing_flag | V222_missing_flag | V223_missing_flag | V224_missing_flag | V225_missing_flag | V226_missing_flag | V227_missing_flag | V228_missing_flag | V229_missing_flag | V230_missing_flag | V231_missing_flag | V232_missing_flag | V233_missing_flag | V234_missing_flag | V235_missing_flag | V236_missing_flag | V237_missing_flag | V238_missing_flag | V239_missing_flag | V240_missing_flag | V241_missing_flag | V242_missing_flag | V243_missing_flag | V244_missing_flag | V245_missing_flag | V246_missing_flag | V247_missing_flag | V248_missing_flag | V249_missing_flag | V250_missing_flag | V251_missing_flag | V252_missing_flag | V253_missing_flag | V254_missing_flag | V255_missing_flag | V256_missing_flag | V257_missing_flag | V258_missing_flag | V259_missing_flag | V260_missing_flag | V261_missing_flag | V262_missing_flag | V263_missing_flag | V264_missing_flag | V265_missing_flag | V266_missing_flag | V267_missing_flag | V268_missing_flag | V269_missing_flag | V270_missing_flag | V271_missing_flag | V272_missing_flag | V273_missing_flag | V274_missing_flag | V275_missing_flag | V276_missing_flag | V277_missing_flag | V278_missing_flag | V279_missing_flag | V280_missing_flag | V281_missing_flag | V282_missing_flag | V283_missing_flag | V284_missing_flag | V285_missing_flag | V286_missing_flag | V287_missing_flag | V288_missing_flag | V289_missing_flag | V290_missing_flag | V291_missing_flag | V292_missing_flag | V293_missing_flag | V294_missing_flag | V295_missing_flag | V296_missing_flag | V297_missing_flag | V298_missing_flag | V299_missing_flag | V300_missing_flag | V301_missing_flag | V302_missing_flag | V303_missing_flag | V304_missing_flag | V305_missing_flag | V306_missing_flag | V307_missing_flag | V308_missing_flag | V309_missing_flag | V310_missing_flag | V311_missing_flag | V312_missing_flag | V313_missing_flag | V314_missing_flag | V315_missing_flag | V316_missing_flag | V317_missing_flag | V318_missing_flag | V319_missing_flag | V320_missing_flag | V321_missing_flag | V322_missing_flag | V323_missing_flag | V324_missing_flag | V325_missing_flag | V326_missing_flag | V327_missing_flag | V328_missing_flag | V329_missing_flag | V330_missing_flag | V331_missing_flag | V332_missing_flag | V333_missing_flag | V334_missing_flag | V335_missing_flag | V336_missing_flag | V337_missing_flag | V338_missing_flag | V339_missing_flag | id_01_missing_flag | id_02_missing_flag | id_03_missing_flag | id_04_missing_flag | id_05_missing_flag | id_06_missing_flag | id_07_missing_flag | id_08_missing_flag | id_09_missing_flag | id_10_missing_flag | id_11_missing_flag | id_12_missing_flag | id_13_missing_flag | id_14_missing_flag | id_15_missing_flag | id_16_missing_flag | id_17_missing_flag | id_18_missing_flag | id_19_missing_flag | id_20_missing_flag | id_21_missing_flag | id_22_missing_flag | id_23_missing_flag | id_24_missing_flag | id_25_missing_flag | id_26_missing_flag | id_27_missing_flag | id_28_missing_flag | id_29_missing_flag | id_30_missing_flag | id_31_missing_flag | id_32_missing_flag | id_33_missing_flag | id_34_missing_flag | id_35_missing_flag | id_36_missing_flag | id_37_missing_flag | id_38_missing_flag | DeviceType_missing_flag | DeviceInfo_missing_flag | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2987000 | 0 | 86400 | 68.5 | W | 13926 | NaN | 150.0 | discover | 142.0 | credit | 315.0 | 87.0 | 19.0 | NaN | NaN | NaN | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | 1.0 | 1.0 | 14.0 | NaN | 13.0 | NaN | NaN | NaN | NaN | NaN | NaN | 13.0 | 13.0 | NaN | NaN | NaN | 0.0 | T | T | T | M2 | F | T | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | True | True | True | True | True | True | False | False | True | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 1 | 2987001 | 0 | 86401 | 29.0 | W | 2755 | 404.0 | 150.0 | mastercard | 102.0 | credit | 325.0 | 87.0 | NaN | NaN | gmail.com | NaN | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | M0 | T | T | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False |
| 2 | 2987002 | 0 | 86469 | 59.0 | W | 4663 | 490.0 | 150.0 | visa | 166.0 | debit | 330.0 | 87.0 | 287.0 | NaN | outlook.com | NaN | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 315.0 | NaN | NaN | NaN | 315.0 | T | T | T | M0 | F | F | F | F | F | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | True | True | False | False | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | False |
| 3 | 2987003 | 0 | 86499 | 50.0 | W | 18132 | 567.0 | 150.0 | mastercard | 117.0 | debit | 476.0 | 87.0 | NaN | NaN | yahoo.com | NaN | 2.0 | 5.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 25.0 | 1.0 | 112.0 | 112.0 | 0.0 | 94.0 | 0.0 | NaN | NaN | NaN | NaN | 84.0 | NaN | NaN | NaN | NaN | 111.0 | NaN | NaN | NaN | M0 | T | F | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 48.0 | 28.0 | 0.0 | 10.0 | 4.0 | 1.0 | 38.0 | 24.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 50.0 | 1758.0 | 925.0 | 0.0 | 354.0 | 135.0 | 50.0 | 1404.0 | 790.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | True |
| 4 | 2987004 | 0 | 86506 | 50.0 | H | 4497 | 514.0 | 150.0 | mastercard | 102.0 | credit | 420.0 | 87.0 | NaN | NaN | gmail.com | NaN | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 18.0 | 140.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1803.0 | 49.0 | 64.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 15560.0 | 169690.796875 | 0.0 | 0.0 | 0.0 | 515.0 | 5155.0 | 2840.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | False | False | False | False | True | False | False | False | False | False | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False |
5 rows × 868 columns
Let's drop the columns which may not be useful for our analysis
Create a missing value flag column for the columns we are dropping which have more than 90% missing values, there might be some specific pattern associated with missing values and transaction being fraud
# Drop the columns where one category contains more than 90% values
drop_cols = []
for col in df.columns:
missing_share = df[col].isnull().sum()/df.shape[0]
if missing_share > 0.9:
drop_cols.append(col)
print(col)
# df[col + "_missing_flag"] = df[col].isnull()
good_cols = [col for col in df.columns if col not in drop_cols]
dist2 D7 id_07 id_08 id_18 id_21 id_22 id_23 id_24 id_25 id_26 id_27
Remove the columns which doesn't having any variance
# Drop the columns which have only one unique value
drop_cols = []
for col in good_cols:
unique_value = df[col].nunique()
if unique_value == 1:
drop_cols.append(col)
print(col)
good_cols = [col for col in good_cols if col not in drop_cols]
TransactionID_missing_flag isFraud_missing_flag TransactionDT_missing_flag TransactionAmt_missing_flag ProductCD_missing_flag card1_missing_flag C1_missing_flag C2_missing_flag C3_missing_flag C4_missing_flag C5_missing_flag C6_missing_flag C7_missing_flag C8_missing_flag C9_missing_flag C10_missing_flag C11_missing_flag C12_missing_flag C13_missing_flag C14_missing_flag
Filter the dataset with only good columns
# Filter the data for relevant columns only
df = df[good_cols]
Get dimentions of training dataset
# Dimentions of data
df.shape
(590540, 836)
Let's create date features from TransactionDT features
# Date features
START_DATE = '2017-12-01'
startdate = datetime.datetime.strptime(START_DATE, "%Y-%m-%d")
df["Date"] = df['TransactionDT'].apply(lambda x: (startdate + datetime.timedelta(seconds=x)))
df['_Weekdays'] = df['Date'].dt.dayofweek
df['_Hours'] = df['Date'].dt.hour
df['_Days'] = df['Date'].dt.day
df = reduce_mem_usage(df)
Mem. usage decreased to 1449.38 Mb (0.8% reduction)
Exploratory data analysis is an approach to analyze or investigate data sets to find out patterns and see if any of the variables can be useful to explain / predict the y variables.
Visual methods are often used to summarise the data. Primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing tasks.
The goal of this section is to:
# Get count of target class
df['isFraud'].value_counts()
0 569877 1 20663 Name: isFraud, dtype: int64
Let's check the distribution of target class using a bar plot and check the proportion of transactions amounts being fraud
# Draw a countplot to check the distribution of target variable
df['TransactionAmt'] = df['TransactionAmt'].astype(float)
total = len(df)
total_amt = df.groupby(['isFraud'])['TransactionAmt'].sum().sum()
plt.figure(figsize=(16,6))
plt.subplot(121)
g = sns.countplot(x='isFraud', data=df )
g.set_title("Fraud Transactions Distribution \n 0: No Fraud | 1: Fraud", fontsize=18)
g.set_xlabel("Is fraud?", fontsize=18)
g.set_ylabel('Count', fontsize=18)
for p in g.patches:
height = p.get_height()
g.text(p.get_x()+p.get_width()/2.,
height + 3,
'{:1.2f}%'.format(height/total*100),
ha="center", fontsize=15)
perc_amt = (df.groupby(['isFraud'])['TransactionAmt'].sum())
perc_amt = perc_amt.reset_index()
plt.subplot(122)
g1 = sns.barplot(x='isFraud', y='TransactionAmt', dodge=True, data=perc_amt)
g1.set_title("% Total Amount in Transaction Amt \n 0: No Fraud | 1: Fraud", fontsize=18)
g1.set_xlabel("Is fraud?", fontsize=18)
g1.set_ylabel('Total Transaction Amount Scalar', fontsize=18)
for p in g1.patches:
height = p.get_height()
g1.text(p.get_x()+p.get_width()/2.,
height + 3,
'{:1.2f}%'.format(height/total_amt * 100),
ha="center", fontsize=15)
plt.show()
# Average transaction amount by Y
df.groupby('isFraud')['TransactionAmt'].mean()
isFraud 0 134.511857 1 149.244353 Name: TransactionAmt, dtype: float64
Let's explore the Transaction amount further
# Distribution plot of Transaction Amount
plt.figure(figsize=(16,12))
sns.distplot(df['TransactionAmt'])
plt.title("Transaction Amount Distribution",fontsize=18)
plt.ylabel("Probability")
Text(0, 0.5, 'Probability')
There are certain transactions which are of very high amount, let's remove those to check the distribution
# Distribution plot of Transaction Amount less than 1000
plt.figure(figsize=(16,12))
plt.suptitle('Transaction Values Distribution', fontsize=22)
sns.distplot(df[df['TransactionAmt'] <= 1000]['TransactionAmt'])
plt.title("Transaction Amount Distribuition <= 1000", fontsize=18)
plt.xlabel("Transaction Amount", fontsize=15)
plt.ylabel("Probability", fontsize=15)
plt.show()
Most transactions lie in < $200 range
Transaction amount is right skewed.
Let's look at the log of transaction amount
# Distribution plot of Transaction Amount less than 1000
plt.figure(figsize=(16,12))
plt.suptitle('Transaction Values Distribution', fontsize=22)
sns.distplot(np.log(df['TransactionAmt']))
plt.title("Transaction Amount (Log) Distribuition", fontsize=18)
plt.xlabel("Transaction Amount", fontsize=15)
plt.ylabel("Probability", fontsize=15)
plt.show()
def plot_cat_feat_dist(df, col):
tmp = pd.crosstab(df[col], df['isFraud'], normalize='index') * 100
tmp = tmp.reset_index()
tmp.rename(columns={0:'NoFraud', 1:'Fraud'}, inplace=True)
plt.figure(figsize=(16,12))
plt.suptitle(f'{col} Distributions', fontsize=22)
plt.subplot(221)
g = sns.countplot(x=col, data=df, order=tmp[col].values)
g.set_title(f"{col} Distribution", fontsize=16)
g.set_xlabel(f"{col} Name", fontsize=17)
g.set_ylabel("Count", fontsize=17)
for p in g.patches:
height = p.get_height()
g.text(p.get_x()+p.get_width()/2.,
height + 3,
'{:1.2f}%'.format(height/total*100),
ha="center", fontsize=14)
plt.subplot(222)
g1 = sns.countplot(x=col, hue='isFraud', data=df, order=tmp[col].values)
plt.legend(title='Fraud', loc='best', labels=['No', 'Yes'])
gt = g1.twinx()
gt = sns.pointplot(x=col, y='Fraud', data=tmp, color='black', order=tmp[col].values, legend=False)
gt.set_ylabel("% of Fraud Transactions", fontsize=16)
g1.set_title(f"{col} Distribution by Target Variable (isFraud) ", fontsize=16)
g1.set_xlabel(f"{col} Name", fontsize=17)
g1.set_ylabel("Count", fontsize=17)
plt.subplots_adjust(hspace = 0.4, top = 0.85)
plt.show()
plot_cat_feat_dist(df, "ProductCD")
# Average fraud per transaction by ProductCD
df.groupby('ProductCD')['isFraud'].mean()
ProductCD C 0.116873 H 0.047662 R 0.037826 S 0.058996 W 0.020399 Name: isFraud, dtype: float64
# Card 4
plot_cat_feat_dist(df, "card4")
# Average fraud per transaction by Card4
df.groupby('card4')['isFraud'].mean()
card4 american express 0.028698 discover 0.077282 mastercard 0.034331 visa 0.034756 Name: isFraud, dtype: float64
# Card 6
plot_cat_feat_dist(df, "card6")
# Average fraud per transaction by Card6
df.groupby('card6')['isFraud'].mean()
card6 charge card 0.000000 credit 0.066785 debit 0.024263 debit or credit 0.000000 Name: isFraud, dtype: float64
df.loc[df['P_emaildomain'].isin(['gmail.com', 'gmail']),'P_emaildomain'] = 'Google'
df.loc[df['P_emaildomain'].isin(['yahoo.com', 'yahoo.com.mx', 'yahoo.co.uk',
'yahoo.co.jp', 'yahoo.de', 'yahoo.fr',
'yahoo.es']), 'P_emaildomain'] = 'Yahoo Mail'
df.loc[df['P_emaildomain'].isin(['hotmail.com','outlook.com','msn.com', 'live.com.mx',
'hotmail.es','hotmail.co.uk', 'hotmail.de',
'outlook.es', 'live.com', 'live.fr',
'hotmail.fr']), 'P_emaildomain'] = 'Microsoft'
df.loc[df.P_emaildomain.isin(df.P_emaildomain\
.value_counts()[df.P_emaildomain.value_counts() <= 500 ]\
.index), 'P_emaildomain'] = "Others"
df.P_emaildomain.fillna("NoInf", inplace=True)
def plot_cat_with_amt(df, col, lim=2000):
tmp = pd.crosstab(df[col], df['isFraud'], normalize='index') * 100
tmp = tmp.reset_index()
tmp.rename(columns={0:'NoFraud', 1:'Fraud'}, inplace=True)
plt.figure(figsize=(16,14))
plt.suptitle(f'{col} Distributions ', fontsize=24)
plt.subplot(211)
g = sns.countplot( x=col, data=df, order=list(tmp[col].values))
gt = g.twinx()
gt = sns.pointplot(x=col, y='Fraud', data=tmp, order=list(tmp[col].values),
color='black', legend=False, )
gt.set_ylim(0,tmp['Fraud'].max()*1.1)
gt.set_ylabel("%Fraud Transactions", fontsize=16)
g.set_title(f"Share of {col} categories and % of Fraud Transactions", fontsize=18)
g.set_xlabel(f"{col} Category Names", fontsize=16)
g.set_ylabel("Count", fontsize=17)
g.set_xticklabels(g.get_xticklabels(),rotation=45)
sizes = []
for p in g.patches:
height = p.get_height()
sizes.append(height)
g.text(p.get_x()+p.get_width()/2.,
height + 3,
'{:1.2f}%'.format(height/total*100),
ha="center",fontsize=12)
g.set_ylim(0,max(sizes)*1.15)
#########################################################################
perc_amt = (df.groupby(['isFraud',col])['TransactionAmt'].sum() \
/ df.groupby([col])['TransactionAmt'].sum() * 100).unstack('isFraud')
perc_amt = perc_amt.reset_index()
perc_amt.rename(columns={0:'NoFraud', 1:'Fraud'}, inplace=True)
amt = df.groupby([col])['TransactionAmt'].sum().reset_index()
perc_amt = perc_amt.fillna(0)
plt.subplot(212)
g1 = sns.barplot(x=col, y='TransactionAmt',
data=amt,
order=list(tmp[col].values))
g1t = g1.twinx()
g1t = sns.pointplot(x=col, y='Fraud', data=perc_amt,
order=list(tmp[col].values),
color='black', legend=False, )
g1t.set_ylim(0,perc_amt['Fraud'].max()*1.1)
g1t.set_ylabel("%Fraud Total Amount", fontsize=16)
g.set_xticklabels(g.get_xticklabels(),rotation=45)
g1.set_title(f"Transactions amount by {col} categories and % of Fraud Transactions (Amounts)", fontsize=18)
g1.set_xlabel(f"{col} Category Names", fontsize=16)
g1.set_ylabel("Transaction Total Amount(U$)", fontsize=16)
g1.set_xticklabels(g.get_xticklabels(),rotation=45)
for p in g1.patches:
height = p.get_height()
g1.text(p.get_x()+p.get_width()/2.,
height + 3,
'{:1.2f}%'.format(height/total_amt*100),
ha="center",fontsize=12)
plt.subplots_adjust(hspace=.4, top = 0.9)
plt.show()
plot_cat_with_amt(df, 'P_emaildomain')
# Average fraud per transaction by Card6
df.groupby('P_emaildomain')['isFraud'].mean()
P_emaildomain aim.com 0.126984 anonymous.com 0.023217 aol.com 0.021811 att.net 0.007439 bellsouth.net 0.027763 cableone.net 0.018868 centurylink.net 0.000000 cfl.rr.com 0.000000 charter.net 0.030637 comcast.net 0.031187 cox.net 0.020818 earthlink.net 0.021401 embarqmail.com 0.034615 frontier.com 0.028571 frontiernet.net 0.025641 gmail 0.022177 gmail.com 0.043542 gmx.de 0.000000 hotmail.co.uk 0.000000 hotmail.com 0.052950 hotmail.de 0.000000 hotmail.es 0.065574 hotmail.fr 0.000000 icloud.com 0.031434 juno.com 0.018634 live.com 0.027622 live.com.mx 0.054740 live.fr 0.000000 mac.com 0.032110 mail.com 0.189624 me.com 0.017740 msn.com 0.021994 netzero.com 0.000000 netzero.net 0.005102 optonline.net 0.016815 outlook.com 0.094584 outlook.es 0.130137 prodigy.net.mx 0.004831 protonmail.com 0.407895 ptd.net 0.000000 q.com 0.000000 roadrunner.com 0.009836 rocketmail.com 0.003012 sbcglobal.net 0.004040 sc.rr.com 0.006098 servicios-ta.com 0.000000 suddenlink.net 0.022857 twc.com 0.000000 verizon.net 0.008133 web.de 0.000000 windstream.net 0.000000 yahoo.co.jp 0.000000 yahoo.co.uk 0.000000 yahoo.com 0.022757 yahoo.com.mx 0.010369 yahoo.de 0.000000 yahoo.es 0.014925 yahoo.fr 0.034965 ymail.com 0.020868 Name: isFraud, dtype: float64
df.loc[df['R_emaildomain'].isin(['gmail.com', 'gmail']),'R_emaildomain'] = 'Google'
df.loc[df['R_emaildomain'].isin(['yahoo.com', 'yahoo.com.mx', 'yahoo.co.uk',
'yahoo.co.jp', 'yahoo.de', 'yahoo.fr',
'yahoo.es']), 'R_emaildomain'] = 'Yahoo Mail'
df.loc[df['R_emaildomain'].isin(['hotmail.com','outlook.com','msn.com', 'live.com.mx',
'hotmail.es','hotmail.co.uk', 'hotmail.de',
'outlook.es', 'live.com', 'live.fr',
'hotmail.fr']), 'R_emaildomain'] = 'Microsoft'
df.loc[df.R_emaildomain.isin(df.R_emaildomain\
.value_counts()[df.R_emaildomain.value_counts() <= 300 ]\
.index), 'R_emaildomain'] = "Others"
df.R_emaildomain.fillna("NoInf", inplace=True)
plot_cat_with_amt(df, 'R_emaildomain')
# Average fraud per transaction by Card6
df.groupby('R_emaildomain')['isFraud'].mean()
R_emaildomain aim.com 0.027778 anonymous.com 0.029130 aol.com 0.034855 att.net 0.000000 bellsouth.net 0.004739 cableone.net 0.000000 centurylink.net 0.000000 cfl.rr.com 0.000000 charter.net 0.039370 comcast.net 0.011589 cox.net 0.023965 earthlink.net 0.075949 embarqmail.com 0.000000 frontier.com 0.000000 frontiernet.net 0.000000 gmail 0.000000 gmail.com 0.119184 gmx.de 0.000000 hotmail.co.uk 0.000000 hotmail.com 0.077793 hotmail.de 0.000000 hotmail.es 0.068493 hotmail.fr 0.000000 icloud.com 0.128755 juno.com 0.000000 live.com 0.049869 live.com.mx 0.058355 live.fr 0.000000 mac.com 0.009174 mail.com 0.377049 me.com 0.019784 msn.com 0.001174 netzero.com 0.000000 netzero.net 0.222222 optonline.net 0.010695 outlook.com 0.165138 outlook.es 0.131640 prodigy.net.mx 0.004831 protonmail.com 0.951220 ptd.net 0.000000 q.com 0.000000 roadrunner.com 0.000000 rocketmail.com 0.043478 sbcglobal.net 0.001812 sc.rr.com 0.000000 scranton.edu 0.000000 servicios-ta.com 0.000000 suddenlink.net 0.040000 twc.com 0.000000 verizon.net 0.000000 web.de 0.000000 windstream.net 0.000000 yahoo.co.jp 0.000000 yahoo.co.uk 0.000000 yahoo.com 0.051512 yahoo.com.mx 0.010610 yahoo.de 0.000000 yahoo.es 0.035088 yahoo.fr 0.036496 ymail.com 0.038647 Name: isFraud, dtype: float64
Reference date is not known, it has been assumed. So can't say concretely if the day number is correct
plot_cat_with_amt(df, '_Days')
The perc of fraud transactions is highest towards the beginning and the end of the month. Might be accelerated at the time of receiving pay-checks.
Incidentally, fraud transaction rate is high on the days when number of transactions are less
Day 29,30 and 31 are having less transactions, looks like people are cautious with spending in those times.
Reference date is not known, it has been assumed. So can't say concretely if the day number is correct
plot_cat_with_amt(df, '_Weekdays')
plot_cat_with_amt(df, '_Hours')
plot_cat_with_amt(df, "DeviceType")
for col in ['id_12', 'id_15', 'id_16', 'id_28', 'id_29']:
df[col] = df[col].fillna('NaN')
plot_cat_with_amt(df, col)
df.loc[df['id_30'].str.contains('Windows', na=False), 'id_30'] = 'Windows'
df.loc[df['id_30'].str.contains('iOS', na=False), 'id_30'] = 'iOS'
df.loc[df['id_30'].str.contains('Mac OS', na=False), 'id_30'] = 'Mac'
df.loc[df['id_30'].str.contains('Android', na=False), 'id_30'] = 'Android'
df['id_30'].fillna("NAN", inplace=True)
plot_cat_with_amt(df, "id_30")
df.loc[df['id_31'].str.contains('chrome', na=False), 'id_31'] = 'Chrome'
df.loc[df['id_31'].str.contains('firefox', na=False), 'id_31'] = 'Firefox'
df.loc[df['id_31'].str.contains('safari', na=False), 'id_31'] = 'Safari'
df.loc[df['id_31'].str.contains('edge', na=False), 'id_31'] = 'Edge'
df.loc[df['id_31'].str.contains('ie', na=False), 'id_31'] = 'IE'
df.loc[df['id_31'].str.contains('samsung', na=False), 'id_31'] = 'Samsung'
df.loc[df['id_31'].str.contains('opera', na=False), 'id_31'] = 'Opera'
df['id_31'].fillna("NAN", inplace=True)
df.loc[df.id_31.isin(df.id_31.value_counts()[df.id_31.value_counts() < 200].index), 'id_31'] = "Others"
plot_cat_with_amt(df, "id_31")
cat_columns = df.select_dtypes(include=['object']).columns
len(cat_columns)
29
binary_columns = [col for col in df.columns if df[col].nunique() == 2]
len(binary_columns)
435
num_columns = [col for col in df.columns if (col not in cat_columns) & (col not in binary_columns)]
len(num_columns)
389
cat_columns = cat_columns.to_list() + binary_columns
from scipy.stats import chi2_contingency
# significance value
alpha = 0.05
significant_categorical_variables = []
for col in cat_columns:
# Create a crosstab table
temp = pd.crosstab(df[col],df['isFraud'].astype('category'))
# Get chi-square value , p-value, degrees of freedom, expected frequencies using the function chi2_contingency
stat, p, dof, expected = chi2_contingency(temp)
# Determine whether to reject or keep your null hypothesis
print(col.ljust(40), ', chisquared=%.5f, p-value=%.5f' % (stat, p))
if p <= alpha:
significant_categorical_variables.append(col)
else:
""
ProductCD , chisquared=16742.17153, p-value=0.00000 card4 , chisquared=364.87414, p-value=0.00000 card6 , chisquared=5957.03229, p-value=0.00000 P_emaildomain , chisquared=3497.81283, p-value=0.00000 R_emaildomain , chisquared=17297.50859, p-value=0.00000 M1 , chisquared=0.00003, p-value=0.99581 M2 , chisquared=438.61321, p-value=0.00000 M3 , chisquared=477.66057, p-value=0.00000 M4 , chisquared=6450.44798, p-value=0.00000 M5 , chisquared=242.42169, p-value=0.00000 M6 , chisquared=227.96414, p-value=0.00000 M7 , chisquared=11.25610, p-value=0.00079 M8 , chisquared=88.53022, p-value=0.00000 M9 , chisquared=250.37250, p-value=0.00000 id_12 , chisquared=429.84996, p-value=0.00000 id_15 , chisquared=421.13420, p-value=0.00000 id_16 , chisquared=366.45700, p-value=0.00000 id_28 , chisquared=420.29657, p-value=0.00000 id_29 , chisquared=420.22938, p-value=0.00000 id_30 , chisquared=176.98386, p-value=0.00000 id_31 , chisquared=424.58675, p-value=0.00000 id_33 , chisquared=212.82695, p-value=0.98357 id_34 , chisquared=11.47415, p-value=0.00942 id_35 , chisquared=2.26392, p-value=0.13242 id_36 , chisquared=0.02303, p-value=0.87939 id_37 , chisquared=1.24550, p-value=0.26441 id_38 , chisquared=2.35329, p-value=0.12502 DeviceType , chisquared=0.39659, p-value=0.52885 DeviceInfo , chisquared=1476.08487, p-value=1.00000 isFraud , chisquared=590510.38453, p-value=0.00000 M1 , chisquared=0.00003, p-value=0.99581 M2 , chisquared=438.61321, p-value=0.00000 M3 , chisquared=477.66057, p-value=0.00000 M5 , chisquared=242.42169, p-value=0.00000 M6 , chisquared=227.96414, p-value=0.00000 M7 , chisquared=11.25610, p-value=0.00079 M8 , chisquared=88.53022, p-value=0.00000 M9 , chisquared=250.37250, p-value=0.00000 V1 , chisquared=0.08480, p-value=0.77090 V14 , chisquared=1.85823, p-value=0.17283 V41 , chisquared=6.45761, p-value=0.01105 V65 , chisquared=2.95009, p-value=0.08587 V88 , chisquared=0.06115, p-value=0.80468 V107 , chisquared=3.20035, p-value=0.07362 V305 , chisquared=0.95990, p-value=0.32721 id_35 , chisquared=2.26392, p-value=0.13242 id_36 , chisquared=0.02303, p-value=0.87939 id_37 , chisquared=1.24550, p-value=0.26441 id_38 , chisquared=2.35329, p-value=0.12502 DeviceType , chisquared=0.39659, p-value=0.52885 card2_missing_flag , chisquared=40.68296, p-value=0.00000 card3_missing_flag , chisquared=4.41810, p-value=0.03556 card4_missing_flag , chisquared=3.52353, p-value=0.06050 card5_missing_flag , chisquared=25.61819, p-value=0.00000 card6_missing_flag , chisquared=4.52321, p-value=0.03344 addr1_missing_flag , chisquared=15016.72347, p-value=0.00000 addr2_missing_flag , chisquared=15016.72347, p-value=0.00000 dist1_missing_flag , chisquared=2672.80512, p-value=0.00000 dist2_missing_flag , chisquared=4898.54088, p-value=0.00000 P_emaildomain_missing_flag , chisquared=98.80683, p-value=0.00000 R_emaildomain_missing_flag , chisquared=11593.85862, p-value=0.00000 D1_missing_flag , chisquared=0.02818, p-value=0.86669 D2_missing_flag , chisquared=1770.65814, p-value=0.00000 D3_missing_flag , chisquared=691.46860, p-value=0.00000 D4_missing_flag , chisquared=8.39702, p-value=0.00376 D5_missing_flag , chisquared=225.02626, p-value=0.00000 D6_missing_flag , chisquared=12282.70285, p-value=0.00000 D7_missing_flag , chisquared=15972.26519, p-value=0.00000 D8_missing_flag , chisquared=12263.96026, p-value=0.00000 D9_missing_flag , chisquared=12263.96026, p-value=0.00000 D10_missing_flag , chisquared=671.51048, p-value=0.00000 D11_missing_flag , chisquared=4605.06821, p-value=0.00000 D12_missing_flag , chisquared=14617.29683, p-value=0.00000 D13_missing_flag , chisquared=11641.55968, p-value=0.00000 D14_missing_flag , chisquared=13502.69648, p-value=0.00000 D15_missing_flag , chisquared=524.34621, p-value=0.00000 M1_missing_flag , chisquared=4720.57858, p-value=0.00000 M2_missing_flag , chisquared=4720.57858, p-value=0.00000 M3_missing_flag , chisquared=4720.57858, p-value=0.00000 M4_missing_flag , chisquared=4291.55730, p-value=0.00000 M5_missing_flag , chisquared=143.24698, p-value=0.00000 M6_missing_flag , chisquared=8958.36720, p-value=0.00000 M7_missing_flag , chisquared=2876.27247, p-value=0.00000 M8_missing_flag , chisquared=2876.92899, p-value=0.00000 M9_missing_flag , chisquared=2876.92899, p-value=0.00000 V1_missing_flag , chisquared=4605.06821, p-value=0.00000 V2_missing_flag , chisquared=4605.06821, p-value=0.00000 V3_missing_flag , chisquared=4605.06821, p-value=0.00000 V4_missing_flag , chisquared=4605.06821, p-value=0.00000 V5_missing_flag , chisquared=4605.06821, p-value=0.00000 V6_missing_flag , chisquared=4605.06821, p-value=0.00000 V7_missing_flag , chisquared=4605.06821, p-value=0.00000 V8_missing_flag , chisquared=4605.06821, p-value=0.00000 V9_missing_flag , chisquared=4605.06821, p-value=0.00000 V10_missing_flag , chisquared=4605.06821, p-value=0.00000 V11_missing_flag , chisquared=4605.06821, p-value=0.00000 V12_missing_flag , chisquared=670.26786, p-value=0.00000 V13_missing_flag , chisquared=670.26786, p-value=0.00000 V14_missing_flag , chisquared=670.26786, p-value=0.00000 V15_missing_flag , chisquared=670.26786, p-value=0.00000 V16_missing_flag , chisquared=670.26786, p-value=0.00000 V17_missing_flag , chisquared=670.26786, p-value=0.00000 V18_missing_flag , chisquared=670.26786, p-value=0.00000 V19_missing_flag , chisquared=670.26786, p-value=0.00000 V20_missing_flag , chisquared=670.26786, p-value=0.00000 V21_missing_flag , chisquared=670.26786, p-value=0.00000 V22_missing_flag , chisquared=670.26786, p-value=0.00000 V23_missing_flag , chisquared=670.26786, p-value=0.00000 V24_missing_flag , chisquared=670.26786, p-value=0.00000 V25_missing_flag , chisquared=670.26786, p-value=0.00000 V26_missing_flag , chisquared=670.26786, p-value=0.00000 V27_missing_flag , chisquared=670.26786, p-value=0.00000 V28_missing_flag , chisquared=670.26786, p-value=0.00000 V29_missing_flag , chisquared=670.26786, p-value=0.00000 V30_missing_flag , chisquared=670.26786, p-value=0.00000 V31_missing_flag , chisquared=670.26786, p-value=0.00000 V32_missing_flag , chisquared=670.26786, p-value=0.00000 V33_missing_flag , chisquared=670.26786, p-value=0.00000 V34_missing_flag , chisquared=670.26786, p-value=0.00000 V35_missing_flag , chisquared=8.24695, p-value=0.00408 V36_missing_flag , chisquared=8.24695, p-value=0.00408 V37_missing_flag , chisquared=8.24695, p-value=0.00408 V38_missing_flag , chisquared=8.24695, p-value=0.00408 V39_missing_flag , chisquared=8.24695, p-value=0.00408 V40_missing_flag , chisquared=8.24695, p-value=0.00408 V41_missing_flag , chisquared=8.24695, p-value=0.00408 V42_missing_flag , chisquared=8.24695, p-value=0.00408 V43_missing_flag , chisquared=8.24695, p-value=0.00408 V44_missing_flag , chisquared=8.24695, p-value=0.00408 V45_missing_flag , chisquared=8.24695, p-value=0.00408 V46_missing_flag , chisquared=8.24695, p-value=0.00408 V47_missing_flag , chisquared=8.24695, p-value=0.00408 V48_missing_flag , chisquared=8.24695, p-value=0.00408 V49_missing_flag , chisquared=8.24695, p-value=0.00408 V50_missing_flag , chisquared=8.24695, p-value=0.00408 V51_missing_flag , chisquared=8.24695, p-value=0.00408 V52_missing_flag , chisquared=8.24695, p-value=0.00408 V53_missing_flag , chisquared=1448.91324, p-value=0.00000 V54_missing_flag , chisquared=1448.91324, p-value=0.00000 V55_missing_flag , chisquared=1448.91324, p-value=0.00000 V56_missing_flag , chisquared=1448.91324, p-value=0.00000 V57_missing_flag , chisquared=1448.91324, p-value=0.00000 V58_missing_flag , chisquared=1448.91324, p-value=0.00000 V59_missing_flag , chisquared=1448.91324, p-value=0.00000 V60_missing_flag , chisquared=1448.91324, p-value=0.00000 V61_missing_flag , chisquared=1448.91324, p-value=0.00000 V62_missing_flag , chisquared=1448.91324, p-value=0.00000 V63_missing_flag , chisquared=1448.91324, p-value=0.00000 V64_missing_flag , chisquared=1448.91324, p-value=0.00000 V65_missing_flag , chisquared=1448.91324, p-value=0.00000 V66_missing_flag , chisquared=1448.91324, p-value=0.00000 V67_missing_flag , chisquared=1448.91324, p-value=0.00000 V68_missing_flag , chisquared=1448.91324, p-value=0.00000 V69_missing_flag , chisquared=1448.91324, p-value=0.00000 V70_missing_flag , chisquared=1448.91324, p-value=0.00000 V71_missing_flag , chisquared=1448.91324, p-value=0.00000 V72_missing_flag , chisquared=1448.91324, p-value=0.00000 V73_missing_flag , chisquared=1448.91324, p-value=0.00000 V74_missing_flag , chisquared=1448.91324, p-value=0.00000 V75_missing_flag , chisquared=522.48477, p-value=0.00000 V76_missing_flag , chisquared=522.48477, p-value=0.00000 V77_missing_flag , chisquared=522.48477, p-value=0.00000 V78_missing_flag , chisquared=522.48477, p-value=0.00000 V79_missing_flag , chisquared=522.48477, p-value=0.00000 V80_missing_flag , chisquared=522.48477, p-value=0.00000 V81_missing_flag , chisquared=522.48477, p-value=0.00000 V82_missing_flag , chisquared=522.48477, p-value=0.00000 V83_missing_flag , chisquared=522.48477, p-value=0.00000 V84_missing_flag , chisquared=522.48477, p-value=0.00000 V85_missing_flag , chisquared=522.48477, p-value=0.00000 V86_missing_flag , chisquared=522.48477, p-value=0.00000 V87_missing_flag , chisquared=522.48477, p-value=0.00000 V88_missing_flag , chisquared=522.48477, p-value=0.00000 V89_missing_flag , chisquared=522.48477, p-value=0.00000 V90_missing_flag , chisquared=522.48477, p-value=0.00000 V91_missing_flag , chisquared=522.48477, p-value=0.00000 V92_missing_flag , chisquared=522.48477, p-value=0.00000 V93_missing_flag , chisquared=522.48477, p-value=0.00000 V94_missing_flag , chisquared=522.48477, p-value=0.00000 V95_missing_flag , chisquared=2.86829, p-value=0.09034 V96_missing_flag , chisquared=2.86829, p-value=0.09034 V97_missing_flag , chisquared=2.86829, p-value=0.09034 V98_missing_flag , chisquared=2.86829, p-value=0.09034 V99_missing_flag , chisquared=2.86829, p-value=0.09034 V100_missing_flag , chisquared=2.86829, p-value=0.09034 V101_missing_flag , chisquared=2.86829, p-value=0.09034 V102_missing_flag , chisquared=2.86829, p-value=0.09034 V103_missing_flag , chisquared=2.86829, p-value=0.09034 V104_missing_flag , chisquared=2.86829, p-value=0.09034 V105_missing_flag , chisquared=2.86829, p-value=0.09034 V106_missing_flag , chisquared=2.86829, p-value=0.09034 V107_missing_flag , chisquared=2.86829, p-value=0.09034 V108_missing_flag , chisquared=2.86829, p-value=0.09034 V109_missing_flag , chisquared=2.86829, p-value=0.09034 V110_missing_flag , chisquared=2.86829, p-value=0.09034 V111_missing_flag , chisquared=2.86829, p-value=0.09034 V112_missing_flag , chisquared=2.86829, p-value=0.09034 V113_missing_flag , chisquared=2.86829, p-value=0.09034 V114_missing_flag , chisquared=2.86829, p-value=0.09034 V115_missing_flag , chisquared=2.86829, p-value=0.09034 V116_missing_flag , chisquared=2.86829, p-value=0.09034 V117_missing_flag , chisquared=2.86829, p-value=0.09034 V118_missing_flag , chisquared=2.86829, p-value=0.09034 V119_missing_flag , chisquared=2.86829, p-value=0.09034 V120_missing_flag , chisquared=2.86829, p-value=0.09034 V121_missing_flag , chisquared=2.86829, p-value=0.09034 V122_missing_flag , chisquared=2.86829, p-value=0.09034 V123_missing_flag , chisquared=2.86829, p-value=0.09034 V124_missing_flag , chisquared=2.86829, p-value=0.09034 V125_missing_flag , chisquared=2.86829, p-value=0.09034 V126_missing_flag , chisquared=2.86829, p-value=0.09034 V127_missing_flag , chisquared=2.86829, p-value=0.09034 V128_missing_flag , chisquared=2.86829, p-value=0.09034 V129_missing_flag , chisquared=2.86829, p-value=0.09034 V130_missing_flag , chisquared=2.86829, p-value=0.09034 V131_missing_flag , chisquared=2.86829, p-value=0.09034 V132_missing_flag , chisquared=2.86829, p-value=0.09034 V133_missing_flag , chisquared=2.86829, p-value=0.09034 V134_missing_flag , chisquared=2.86829, p-value=0.09034 V135_missing_flag , chisquared=2.86829, p-value=0.09034 V136_missing_flag , chisquared=2.86829, p-value=0.09034 V137_missing_flag , chisquared=2.86829, p-value=0.09034 V138_missing_flag , chisquared=256.78110, p-value=0.00000 V139_missing_flag , chisquared=256.78110, p-value=0.00000 V140_missing_flag , chisquared=256.78110, p-value=0.00000 V141_missing_flag , chisquared=256.78110, p-value=0.00000 V142_missing_flag , chisquared=256.78110, p-value=0.00000 V143_missing_flag , chisquared=257.28425, p-value=0.00000 V144_missing_flag , chisquared=257.28425, p-value=0.00000 V145_missing_flag , chisquared=257.28425, p-value=0.00000 V146_missing_flag , chisquared=256.78110, p-value=0.00000 V147_missing_flag , chisquared=256.78110, p-value=0.00000 V148_missing_flag , chisquared=256.78110, p-value=0.00000 V149_missing_flag , chisquared=256.78110, p-value=0.00000 V150_missing_flag , chisquared=257.28425, p-value=0.00000 V151_missing_flag , chisquared=257.28425, p-value=0.00000 V152_missing_flag , chisquared=257.28425, p-value=0.00000 V153_missing_flag , chisquared=256.78110, p-value=0.00000 V154_missing_flag , chisquared=256.78110, p-value=0.00000 V155_missing_flag , chisquared=256.78110, p-value=0.00000 V156_missing_flag , chisquared=256.78110, p-value=0.00000 V157_missing_flag , chisquared=256.78110, p-value=0.00000 V158_missing_flag , chisquared=256.78110, p-value=0.00000 V159_missing_flag , chisquared=257.28425, p-value=0.00000 V160_missing_flag , chisquared=257.28425, p-value=0.00000 V161_missing_flag , chisquared=256.78110, p-value=0.00000 V162_missing_flag , chisquared=256.78110, p-value=0.00000 V163_missing_flag , chisquared=256.78110, p-value=0.00000 V164_missing_flag , chisquared=257.28425, p-value=0.00000 V165_missing_flag , chisquared=257.28425, p-value=0.00000 V166_missing_flag , chisquared=257.28425, p-value=0.00000 V167_missing_flag , chisquared=10515.97912, p-value=0.00000 V168_missing_flag , chisquared=10515.97912, p-value=0.00000 V169_missing_flag , chisquared=10589.78839, p-value=0.00000 V170_missing_flag , chisquared=10589.78839, p-value=0.00000 V171_missing_flag , chisquared=10589.78839, p-value=0.00000 V172_missing_flag , chisquared=10515.97912, p-value=0.00000 V173_missing_flag , chisquared=10515.97912, p-value=0.00000 V174_missing_flag , chisquared=10589.78839, p-value=0.00000 V175_missing_flag , chisquared=10589.78839, p-value=0.00000 V176_missing_flag , chisquared=10515.97912, p-value=0.00000 V177_missing_flag , chisquared=10515.97912, p-value=0.00000 V178_missing_flag , chisquared=10515.97912, p-value=0.00000 V179_missing_flag , chisquared=10515.97912, p-value=0.00000 V180_missing_flag , chisquared=10589.78839, p-value=0.00000 V181_missing_flag , chisquared=10515.97912, p-value=0.00000 V182_missing_flag , chisquared=10515.97912, p-value=0.00000 V183_missing_flag , chisquared=10515.97912, p-value=0.00000 V184_missing_flag , chisquared=10589.78839, p-value=0.00000 V185_missing_flag , chisquared=10589.78839, p-value=0.00000 V186_missing_flag , chisquared=10515.97912, p-value=0.00000 V187_missing_flag , chisquared=10515.97912, p-value=0.00000 V188_missing_flag , chisquared=10589.78839, p-value=0.00000 V189_missing_flag , chisquared=10589.78839, p-value=0.00000 V190_missing_flag , chisquared=10515.97912, p-value=0.00000 V191_missing_flag , chisquared=10515.97912, p-value=0.00000 V192_missing_flag , chisquared=10515.97912, p-value=0.00000 V193_missing_flag , chisquared=10515.97912, p-value=0.00000 V194_missing_flag , chisquared=10589.78839, p-value=0.00000 V195_missing_flag , chisquared=10589.78839, p-value=0.00000 V196_missing_flag , chisquared=10515.97912, p-value=0.00000 V197_missing_flag , chisquared=10589.78839, p-value=0.00000 V198_missing_flag , chisquared=10589.78839, p-value=0.00000 V199_missing_flag , chisquared=10515.97912, p-value=0.00000 V200_missing_flag , chisquared=10589.78839, p-value=0.00000 V201_missing_flag , chisquared=10589.78839, p-value=0.00000 V202_missing_flag , chisquared=10515.97912, p-value=0.00000 V203_missing_flag , chisquared=10515.97912, p-value=0.00000 V204_missing_flag , chisquared=10515.97912, p-value=0.00000 V205_missing_flag , chisquared=10515.97912, p-value=0.00000 V206_missing_flag , chisquared=10515.97912, p-value=0.00000 V207_missing_flag , chisquared=10515.97912, p-value=0.00000 V208_missing_flag , chisquared=10589.78839, p-value=0.00000 V209_missing_flag , chisquared=10589.78839, p-value=0.00000 V210_missing_flag , chisquared=10589.78839, p-value=0.00000 V211_missing_flag , chisquared=10515.97912, p-value=0.00000 V212_missing_flag , chisquared=10515.97912, p-value=0.00000 V213_missing_flag , chisquared=10515.97912, p-value=0.00000 V214_missing_flag , chisquared=10515.97912, p-value=0.00000 V215_missing_flag , chisquared=10515.97912, p-value=0.00000 V216_missing_flag , chisquared=10515.97912, p-value=0.00000 V217_missing_flag , chisquared=9230.05652, p-value=0.00000 V218_missing_flag , chisquared=9230.05652, p-value=0.00000 V219_missing_flag , chisquared=9230.05652, p-value=0.00000 V220_missing_flag , chisquared=10023.65463, p-value=0.00000 V221_missing_flag , chisquared=10023.65463, p-value=0.00000 V222_missing_flag , chisquared=10023.65463, p-value=0.00000 V223_missing_flag , chisquared=9230.05652, p-value=0.00000 V224_missing_flag , chisquared=9230.05652, p-value=0.00000 V225_missing_flag , chisquared=9230.05652, p-value=0.00000 V226_missing_flag , chisquared=9230.05652, p-value=0.00000 V227_missing_flag , chisquared=10023.65463, p-value=0.00000 V228_missing_flag , chisquared=9230.05652, p-value=0.00000 V229_missing_flag , chisquared=9230.05652, p-value=0.00000 V230_missing_flag , chisquared=9230.05652, p-value=0.00000 V231_missing_flag , chisquared=9230.05652, p-value=0.00000 V232_missing_flag , chisquared=9230.05652, p-value=0.00000 V233_missing_flag , chisquared=9230.05652, p-value=0.00000 V234_missing_flag , chisquared=10023.65463, p-value=0.00000 V235_missing_flag , chisquared=9230.05652, p-value=0.00000 V236_missing_flag , chisquared=9230.05652, p-value=0.00000 V237_missing_flag , chisquared=9230.05652, p-value=0.00000 V238_missing_flag , chisquared=10023.65463, p-value=0.00000 V239_missing_flag , chisquared=10023.65463, p-value=0.00000 V240_missing_flag , chisquared=9230.05652, p-value=0.00000 V241_missing_flag , chisquared=9230.05652, p-value=0.00000 V242_missing_flag , chisquared=9230.05652, p-value=0.00000 V243_missing_flag , chisquared=9230.05652, p-value=0.00000 V244_missing_flag , chisquared=9230.05652, p-value=0.00000 V245_missing_flag , chisquared=10023.65463, p-value=0.00000 V246_missing_flag , chisquared=9230.05652, p-value=0.00000 V247_missing_flag , chisquared=9230.05652, p-value=0.00000 V248_missing_flag , chisquared=9230.05652, p-value=0.00000 V249_missing_flag , chisquared=9230.05652, p-value=0.00000 V250_missing_flag , chisquared=10023.65463, p-value=0.00000 V251_missing_flag , chisquared=10023.65463, p-value=0.00000 V252_missing_flag , chisquared=9230.05652, p-value=0.00000 V253_missing_flag , chisquared=9230.05652, p-value=0.00000 V254_missing_flag , chisquared=9230.05652, p-value=0.00000 V255_missing_flag , chisquared=10023.65463, p-value=0.00000 V256_missing_flag , chisquared=10023.65463, p-value=0.00000 V257_missing_flag , chisquared=9230.05652, p-value=0.00000 V258_missing_flag , chisquared=9230.05652, p-value=0.00000 V259_missing_flag , chisquared=10023.65463, p-value=0.00000 V260_missing_flag , chisquared=9230.05652, p-value=0.00000 V261_missing_flag , chisquared=9230.05652, p-value=0.00000 V262_missing_flag , chisquared=9230.05652, p-value=0.00000 V263_missing_flag , chisquared=9230.05652, p-value=0.00000 V264_missing_flag , chisquared=9230.05652, p-value=0.00000 V265_missing_flag , chisquared=9230.05652, p-value=0.00000 V266_missing_flag , chisquared=9230.05652, p-value=0.00000 V267_missing_flag , chisquared=9230.05652, p-value=0.00000 V268_missing_flag , chisquared=9230.05652, p-value=0.00000 V269_missing_flag , chisquared=9230.05652, p-value=0.00000 V270_missing_flag , chisquared=10023.65463, p-value=0.00000 V271_missing_flag , chisquared=10023.65463, p-value=0.00000 V272_missing_flag , chisquared=10023.65463, p-value=0.00000 V273_missing_flag , chisquared=9230.05652, p-value=0.00000 V274_missing_flag , chisquared=9230.05652, p-value=0.00000 V275_missing_flag , chisquared=9230.05652, p-value=0.00000 V276_missing_flag , chisquared=9230.05652, p-value=0.00000 V277_missing_flag , chisquared=9230.05652, p-value=0.00000 V278_missing_flag , chisquared=9230.05652, p-value=0.00000 V279_missing_flag , chisquared=2.87936, p-value=0.08972 V280_missing_flag , chisquared=2.87936, p-value=0.08972 V281_missing_flag , chisquared=0.02818, p-value=0.86669 V282_missing_flag , chisquared=0.02818, p-value=0.86669 V283_missing_flag , chisquared=0.02818, p-value=0.86669 V284_missing_flag , chisquared=2.87936, p-value=0.08972 V285_missing_flag , chisquared=2.87936, p-value=0.08972 V286_missing_flag , chisquared=2.87936, p-value=0.08972 V287_missing_flag , chisquared=2.87936, p-value=0.08972 V288_missing_flag , chisquared=0.02818, p-value=0.86669 V289_missing_flag , chisquared=0.02818, p-value=0.86669 V290_missing_flag , chisquared=2.87936, p-value=0.08972 V291_missing_flag , chisquared=2.87936, p-value=0.08972 V292_missing_flag , chisquared=2.87936, p-value=0.08972 V293_missing_flag , chisquared=2.87936, p-value=0.08972 V294_missing_flag , chisquared=2.87936, p-value=0.08972 V295_missing_flag , chisquared=2.87936, p-value=0.08972 V296_missing_flag , chisquared=0.02818, p-value=0.86669 V297_missing_flag , chisquared=2.87936, p-value=0.08972 V298_missing_flag , chisquared=2.87936, p-value=0.08972 V299_missing_flag , chisquared=2.87936, p-value=0.08972 V300_missing_flag , chisquared=0.02818, p-value=0.86669 V301_missing_flag , chisquared=0.02818, p-value=0.86669 V302_missing_flag , chisquared=2.87936, p-value=0.08972 V303_missing_flag , chisquared=2.87936, p-value=0.08972 V304_missing_flag , chisquared=2.87936, p-value=0.08972 V305_missing_flag , chisquared=2.87936, p-value=0.08972 V306_missing_flag , chisquared=2.87936, p-value=0.08972 V307_missing_flag , chisquared=2.87936, p-value=0.08972 V308_missing_flag , chisquared=2.87936, p-value=0.08972 V309_missing_flag , chisquared=2.87936, p-value=0.08972 V310_missing_flag , chisquared=2.87936, p-value=0.08972 V311_missing_flag , chisquared=2.87936, p-value=0.08972 V312_missing_flag , chisquared=2.87936, p-value=0.08972 V313_missing_flag , chisquared=0.02818, p-value=0.86669 V314_missing_flag , chisquared=0.02818, p-value=0.86669 V315_missing_flag , chisquared=0.02818, p-value=0.86669 V316_missing_flag , chisquared=2.87936, p-value=0.08972 V317_missing_flag , chisquared=2.87936, p-value=0.08972 V318_missing_flag , chisquared=2.87936, p-value=0.08972 V319_missing_flag , chisquared=2.87936, p-value=0.08972 V320_missing_flag , chisquared=2.87936, p-value=0.08972 V321_missing_flag , chisquared=2.87936, p-value=0.08972 V322_missing_flag , chisquared=270.16694, p-value=0.00000 V323_missing_flag , chisquared=270.16694, p-value=0.00000 V324_missing_flag , chisquared=270.16694, p-value=0.00000 V325_missing_flag , chisquared=270.16694, p-value=0.00000 V326_missing_flag , chisquared=270.16694, p-value=0.00000 V327_missing_flag , chisquared=270.16694, p-value=0.00000 V328_missing_flag , chisquared=270.16694, p-value=0.00000 V329_missing_flag , chisquared=270.16694, p-value=0.00000 V330_missing_flag , chisquared=270.16694, p-value=0.00000 V331_missing_flag , chisquared=270.16694, p-value=0.00000 V332_missing_flag , chisquared=270.16694, p-value=0.00000 V333_missing_flag , chisquared=270.16694, p-value=0.00000 V334_missing_flag , chisquared=270.16694, p-value=0.00000 V335_missing_flag , chisquared=270.16694, p-value=0.00000 V336_missing_flag , chisquared=270.16694, p-value=0.00000 V337_missing_flag , chisquared=270.16694, p-value=0.00000 V338_missing_flag , chisquared=270.16694, p-value=0.00000 V339_missing_flag , chisquared=270.16694, p-value=0.00000 id_01_missing_flag , chisquared=429.43113, p-value=0.00000 id_02_missing_flag , chisquared=418.79840, p-value=0.00000 id_03_missing_flag , chisquared=155.60340, p-value=0.00000 id_04_missing_flag , chisquared=155.60340, p-value=0.00000 id_05_missing_flag , chisquared=410.62370, p-value=0.00000 id_06_missing_flag , chisquared=410.62370, p-value=0.00000 id_07_missing_flag , chisquared=11.67033, p-value=0.00064 id_08_missing_flag , chisquared=11.67033, p-value=0.00064 id_09_missing_flag , chisquared=186.68294, p-value=0.00000 id_10_missing_flag , chisquared=186.68294, p-value=0.00000 id_11_missing_flag , chisquared=419.74461, p-value=0.00000 id_12_missing_flag , chisquared=429.43113, p-value=0.00000 id_13_missing_flag , chisquared=372.26945, p-value=0.00000 id_14_missing_flag , chisquared=186.57626, p-value=0.00000 id_15_missing_flag , chisquared=419.89703, p-value=0.00000 id_16_missing_flag , chisquared=365.88234, p-value=0.00000 id_17_missing_flag , chisquared=421.50054, p-value=0.00000 id_18_missing_flag , chisquared=108.11199, p-value=0.00000 id_19_missing_flag , chisquared=421.07000, p-value=0.00000 id_20_missing_flag , chisquared=419.82460, p-value=0.00000 id_21_missing_flag , chisquared=11.73422, p-value=0.00061 id_22_missing_flag , chisquared=11.89450, p-value=0.00056 id_23_missing_flag , chisquared=11.89450, p-value=0.00056 id_24_missing_flag , chisquared=9.86160, p-value=0.00169 id_25_missing_flag , chisquared=12.35494, p-value=0.00044 id_26_missing_flag , chisquared=11.79823, p-value=0.00059 id_27_missing_flag , chisquared=11.89450, p-value=0.00056 id_28_missing_flag , chisquared=419.74461, p-value=0.00000 id_29_missing_flag , chisquared=419.74461, p-value=0.00000 id_30_missing_flag , chisquared=175.29045, p-value=0.00000 id_31_missing_flag , chisquared=417.52693, p-value=0.00000 id_32_missing_flag , chisquared=175.65824, p-value=0.00000 id_33_missing_flag , chisquared=171.60347, p-value=0.00000 id_34_missing_flag , chisquared=180.63611, p-value=0.00000 id_35_missing_flag , chisquared=419.89703, p-value=0.00000 id_36_missing_flag , chisquared=419.89703, p-value=0.00000 id_37_missing_flag , chisquared=419.89703, p-value=0.00000 id_38_missing_flag , chisquared=419.89703, p-value=0.00000 DeviceType_missing_flag , chisquared=417.45054, p-value=0.00000 DeviceInfo_missing_flag , chisquared=317.74598, p-value=0.00000
# Significant variables
# print(significant_categorical_variables)
Chi-Square test tells if the entire variable is useful or not.
ctab = pd.crosstab(df['ProductCD'], df['isFraud'].astype('category'))
ctab
| isFraud | 0 | 1 |
|---|---|---|
| ProductCD | ||
| C | 60511 | 8008 |
| H | 31450 | 1574 |
| R | 36273 | 1426 |
| S | 10942 | 686 |
| W | 430701 | 8969 |
ctab.columns = ctab.columns.add_categories('odds')
ctab['odds'] = ctab[1]/ctab[0]
ctab
| isFraud | 0 | 1 | odds |
|---|---|---|---|
| ProductCD | |||
| C | 60511 | 8008 | 0.132340 |
| H | 31450 | 1574 | 0.050048 |
| R | 36273 | 1426 | 0.039313 |
| S | 10942 | 686 | 0.062694 |
| W | 430701 | 8969 | 0.020824 |
ctab.columns = ctab.columns.add_categories('odds_ratio')
ctab['odds_ratio'] = ctab['odds'] / (ctab[1].sum()/ctab[0].sum())
ctab
| isFraud | 0 | 1 | odds | odds_ratio |
|---|---|---|---|---|
| ProductCD | ||||
| C | 60511 | 8008 | 0.132340 | 3.649871 |
| H | 31450 | 1574 | 0.050048 | 1.380295 |
| R | 36273 | 1426 | 0.039313 | 1.084236 |
| S | 10942 | 686 | 0.062694 | 1.729080 |
| W | 430701 | 8969 | 0.020824 | 0.574323 |
Highers odds ratio implies more chance of fraud in that category.
Farther away it is from 1.0 (both directions) more important the variable is.
from scipy.stats import f_oneway
# significance value
alpha = 0.05
significant_numerical_variables = []
for col in num_columns[2:]:
# Determine whether to reject or keep your null hypothesis
if df.loc[:, col].nunique() > 50:
F, p = f_oneway(df[df.isFraud == 1][col].dropna(),
df[df.isFraud == 0][col].dropna())
print(col.ljust(40), ', F-statistic=%.5f, p=%.5f' % (F, p), df.loc[:, col].nunique())
if p <= alpha:
significant_numerical_variables.append(col)
TransactionAmt , F-statistic=75.67718, p=0.00000 8195 card1 , F-statistic=109.88932, p=0.00000 13553 card2 , F-statistic=6.67558, p=0.00977 500 card3 , F-statistic=14336.21578, p=0.00000 114 card5 , F-statistic=661.82659, p=0.00000 119 addr1 , F-statistic=16.43435, p=0.00005 332 addr2 , F-statistic=485.06660, p=0.00000 74 dist1 , F-statistic=110.42966, p=0.00000 2412 C1 , F-statistic=552.37693, p=0.00000 1495 C2 , F-statistic=819.61843, p=0.00000 1167 C4 , F-statistic=545.60992, p=0.00000 1223 C5 , F-statistic=559.06343, p=0.00000 319 C6 , F-statistic=258.28478, p=0.00000 1291 C7 , F-statistic=468.69042, p=0.00000 1069 C8 , F-statistic=610.57329, p=0.00000 1130 C9 , F-statistic=594.15081, p=0.00000 205 C10 , F-statistic=476.56495, p=0.00000 1122 C11 , F-statistic=446.39851, p=0.00000 1343 C12 , F-statistic=601.74057, p=0.00000 1066 C13 , F-statistic=73.37132, p=0.00000 1464 C14 , F-statistic=37.04987, p=0.00000 1108 D1 , F-statistic=2672.56121, p=0.00000 641 D2 , F-statistic=2179.12220, p=0.00000 641 D3 , F-statistic=703.03864, p=0.00000 649 D4 , F-statistic=1913.51964, p=0.00000 808 D5 , F-statistic=1177.67893, p=0.00000 688 D6 , F-statistic=240.53556, p=0.00000 829 D8 , F-statistic=1555.93433, p=0.00000 5367 D10 , F-statistic=2681.29292, p=0.00000 818 D11 , F-statistic=634.20722, p=0.00000 676 D12 , F-statistic=53.96035, p=0.00000 635 D13 , F-statistic=219.58119, p=0.00000 577 D14 , F-statistic=4.66673, p=0.03076 802 D15 , F-statistic=3031.40845, p=0.00000 859 V37 , F-statistic=13626.03770, p=0.00000 55 V38 , F-statistic=17383.75607, p=0.00000 55 V56 , F-statistic=1940.19590, p=0.00000 52 V95 , F-statistic=10.01732, p=0.00155 881 V96 , F-statistic=17.75408, p=0.00003 1410 V97 , F-statistic=11.90221, p=0.00056 976 V99 , F-statistic=102.52878, p=0.00000 89 V101 , F-statistic=13.10561, p=0.00029 870 V102 , F-statistic=13.87026, p=0.00020 1285 V103 , F-statistic=15.47712, p=0.00008 928 V105 , F-statistic=6.15509, p=0.01310 100 V106 , F-statistic=2.58112, p=0.10815 56 V126 , F-statistic=7.41985, p=0.00645 10299 V127 , F-statistic=4.07834, p=0.04344 24414 V128 , F-statistic=2.19911, p=0.13809 14507 V129 , F-statistic=95.66082, p=0.00000 1608 V130 , F-statistic=5.58774, p=0.01809 5511 V131 , F-statistic=368.35409, p=0.00000 3097 V132 , F-statistic=10.72701, p=0.00106 6560 V133 , F-statistic=5.91363, p=0.01502 9949 V134 , F-statistic=7.27959, p=0.00697 8178 V135 , F-statistic=0.04505, p=0.83192 3724 V136 , F-statistic=0.00105, p=0.97410 4852 V137 , F-statistic=0.00778, p=0.92972 4252 V143 , F-statistic=72.67628, p=0.00000 870 V144 , F-statistic=221.93537, p=0.00000 63 V145 , F-statistic=315.85937, p=0.00000 260 V150 , F-statistic=336.66794, p=0.00000 1344 V151 , F-statistic=304.74317, p=0.00000 56 V159 , F-statistic=323.02425, p=0.00000 2492 V160 , F-statistic=319.70138, p=0.00000 9621 V161 , F-statistic=130.04611, p=0.00000 79 V162 , F-statistic=233.47479, p=0.00000 185 V163 , F-statistic=177.81176, p=0.00000 106 V164 , F-statistic=61.77344, p=0.00000 1978 V165 , F-statistic=217.06800, p=0.00000 2547 V166 , F-statistic=71.06800, p=0.00000 987 V167 , F-statistic=23.07236, p=0.00000 873 V168 , F-statistic=35.84863, p=0.00000 965 V171 , F-statistic=6876.38875, p=0.00000 62 V177 , F-statistic=26.27124, p=0.00000 862 V178 , F-statistic=38.22487, p=0.00000 1236 V179 , F-statistic=35.32729, p=0.00000 921 V180 , F-statistic=12.19456, p=0.00048 84 V182 , F-statistic=14.26650, p=0.00016 84 V187 , F-statistic=82.01146, p=0.00000 215 V201 , F-statistic=16856.02855, p=0.00000 56 V202 , F-statistic=45.39293, p=0.00000 10970 V203 , F-statistic=55.54978, p=0.00000 14951 V204 , F-statistic=60.13305, p=0.00000 12858 V205 , F-statistic=13.67504, p=0.00022 1953 V206 , F-statistic=4.52806, p=0.03335 1581 V207 , F-statistic=21.53123, p=0.00000 2705 V208 , F-statistic=11.70640, p=0.00062 2093 V209 , F-statistic=49.58723, p=0.00000 2674 V210 , F-statistic=0.05960, p=0.80713 2262 V211 , F-statistic=48.17990, p=0.00000 7624 V212 , F-statistic=62.41706, p=0.00000 8868 V213 , F-statistic=59.28563, p=0.00000 8317 V214 , F-statistic=3.28560, p=0.06989 2282 V215 , F-statistic=3.21830, p=0.07282 2747 V216 , F-statistic=0.05228, p=0.81914 2532 V217 , F-statistic=273.98881, p=0.00000 304 V218 , F-statistic=295.75410, p=0.00000 401 V219 , F-statistic=277.07672, p=0.00000 379 V221 , F-statistic=2528.82003, p=0.00000 77 V222 , F-statistic=4026.25520, p=0.00000 76 V224 , F-statistic=0.17276, p=0.67767 79 V226 , F-statistic=0.67393, p=0.41169 81 V228 , F-statistic=10162.77697, p=0.00000 55 V229 , F-statistic=2503.20448, p=0.00000 91 V230 , F-statistic=7401.92756, p=0.00000 66 V231 , F-statistic=222.73487, p=0.00000 294 V232 , F-statistic=390.01638, p=0.00000 338 V233 , F-statistic=279.30291, p=0.00000 333 V234 , F-statistic=53.01029, p=0.00000 122 V245 , F-statistic=1189.20909, p=0.00000 58 V253 , F-statistic=50.39006, p=0.00000 66 V258 , F-statistic=12632.06226, p=0.00000 67 V259 , F-statistic=3328.22023, p=0.00000 68 V263 , F-statistic=28.30710, p=0.00000 10422 V264 , F-statistic=11.42220, p=0.00073 13358 V265 , F-statistic=26.39408, p=0.00000 11757 V266 , F-statistic=3.60378, p=0.05765 1871 V267 , F-statistic=7.12289, p=0.00761 2884 V268 , F-statistic=3.47409, p=0.06234 2286 V269 , F-statistic=4.54627, p=0.03299 151 V270 , F-statistic=34.78841, p=0.00000 1972 V271 , F-statistic=105.06963, p=0.00000 2286 V272 , F-statistic=91.15867, p=0.00000 2082 V273 , F-statistic=32.05464, p=0.00000 4689 V274 , F-statistic=31.41484, p=0.00000 8315 V275 , F-statistic=38.23804, p=0.00000 4965 V276 , F-statistic=12.22436, p=0.00047 2263 V277 , F-statistic=15.03960, p=0.00011 2540 V278 , F-statistic=13.14749, p=0.00029 2398 V279 , F-statistic=8.30948, p=0.00394 881 V280 , F-statistic=0.36129, p=0.54779 975 V283 , F-statistic=7585.01985, p=0.00000 62 V285 , F-statistic=49.20336, p=0.00000 96 V290 , F-statistic=953.42763, p=0.00000 58 V291 , F-statistic=246.83555, p=0.00000 219 V292 , F-statistic=423.76303, p=0.00000 173 V293 , F-statistic=11.96267, p=0.00054 870 V294 , F-statistic=10.61896, p=0.00112 1286 V295 , F-statistic=1.66959, p=0.19631 928 V296 , F-statistic=10.20032, p=0.00140 94 V298 , F-statistic=0.98916, p=0.31995 94 V306 , F-statistic=2.04992, p=0.15222 16210 V307 , F-statistic=20.89358, p=0.00000 37367 V308 , F-statistic=6.25009, p=0.01242 23064 V309 , F-statistic=242.35559, p=0.00000 3239 V310 , F-statistic=72.36754, p=0.00000 7759 V311 , F-statistic=0.99820, p=0.31775 2526 V312 , F-statistic=835.01886, p=0.00000 5143 V313 , F-statistic=1016.37502, p=0.00000 3915 V314 , F-statistic=876.33173, p=0.00000 5974 V315 , F-statistic=1377.84553, p=0.00000 4540 V316 , F-statistic=5.17498, p=0.02291 9814 V317 , F-statistic=14.82121, p=0.00012 15184 V318 , F-statistic=0.58715, p=0.44352 12309 V319 , F-statistic=0.00218, p=0.96276 4799 V320 , F-statistic=14.53340, p=0.00014 6439 V321 , F-statistic=1.66098, p=0.19747 5560 V322 , F-statistic=38.22813, p=0.00000 881 V323 , F-statistic=44.84077, p=0.00000 1411 V324 , F-statistic=47.48458, p=0.00000 976 V329 , F-statistic=43.96148, p=0.00000 100 V330 , F-statistic=36.90161, p=0.00000 56 V331 , F-statistic=39.81114, p=0.00000 1758 V332 , F-statistic=45.37959, p=0.00000 2453 V333 , F-statistic=47.99171, p=0.00000 1971 V334 , F-statistic=0.01673, p=0.89708 143 V335 , F-statistic=2.44629, p=0.11781 669 V336 , F-statistic=0.47508, p=0.49066 355 V337 , F-statistic=2.67789, p=0.10175 254 V338 , F-statistic=30.86541, p=0.00000 380 V339 , F-statistic=17.71005, p=0.00003 334 id_01 , F-statistic=0.21069, p=0.64623 77 id_02 , F-statistic=5.40582, p=0.02007 115655 id_05 , F-statistic=0.62754, p=0.42826 93 id_06 , F-statistic=0.32488, p=0.56869 101 id_10 , F-statistic=0.15965, p=0.68948 62 id_11 , F-statistic=0.74750, p=0.38727 146 id_13 , F-statistic=1.07632, p=0.29952 54 id_17 , F-statistic=1.92097, p=0.16575 104 id_19 , F-statistic=0.09828, p=0.75391 522 id_20 , F-statistic=0.34673, p=0.55597 394 Date , F-statistic=101.40691, p=0.00000 573349
# Significant variables
# significant_numerical_variables
TransactionAmt is right skewed so log transform needs to be used to make it normally distributed Feature engineering is the process of using domain and statistical knowledge to extract features from raw data via data mining techniques.
These features often help to improve the performance of machine learning models.
The goal of this section is to:
df.head()
| TransactionID | isFraud | TransactionDT | TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | V29 | V30 | V31 | V32 | V33 | V34 | V35 | V36 | V37 | V38 | V39 | V40 | V41 | V42 | V43 | V44 | V45 | V46 | V47 | V48 | V49 | V50 | V51 | V52 | V53 | V54 | V55 | V56 | V57 | V58 | V59 | V60 | V61 | V62 | V63 | V64 | V65 | V66 | V67 | V68 | V69 | V70 | V71 | V72 | V73 | V74 | V75 | V76 | V77 | V78 | V79 | V80 | V81 | V82 | V83 | V84 | V85 | V86 | V87 | V88 | V89 | V90 | V91 | V92 | V93 | V94 | V95 | V96 | V97 | V98 | V99 | V100 | V101 | V102 | V103 | V104 | V105 | V106 | V107 | V108 | V109 | V110 | V111 | V112 | V113 | V114 | V115 | V116 | V117 | V118 | V119 | V120 | V121 | V122 | V123 | V124 | V125 | V126 | V127 | V128 | V129 | V130 | V131 | V132 | V133 | V134 | V135 | V136 | V137 | V138 | V139 | V140 | V141 | V142 | V143 | V144 | V145 | V146 | V147 | V148 | V149 | V150 | V151 | V152 | V153 | V154 | V155 | V156 | V157 | V158 | V159 | V160 | V161 | V162 | V163 | V164 | V165 | V166 | V167 | V168 | V169 | V170 | V171 | V172 | V173 | V174 | V175 | V176 | V177 | V178 | V179 | V180 | V181 | V182 | V183 | V184 | V185 | V186 | V187 | V188 | V189 | V190 | V191 | V192 | V193 | V194 | V195 | V196 | V197 | ... | V134_missing_flag | V135_missing_flag | V136_missing_flag | V137_missing_flag | V138_missing_flag | V139_missing_flag | V140_missing_flag | V141_missing_flag | V142_missing_flag | V143_missing_flag | V144_missing_flag | V145_missing_flag | V146_missing_flag | V147_missing_flag | V148_missing_flag | V149_missing_flag | V150_missing_flag | V151_missing_flag | V152_missing_flag | V153_missing_flag | V154_missing_flag | V155_missing_flag | V156_missing_flag | V157_missing_flag | V158_missing_flag | V159_missing_flag | V160_missing_flag | V161_missing_flag | V162_missing_flag | V163_missing_flag | V164_missing_flag | V165_missing_flag | V166_missing_flag | V167_missing_flag | V168_missing_flag | V169_missing_flag | V170_missing_flag | V171_missing_flag | V172_missing_flag | V173_missing_flag | V174_missing_flag | V175_missing_flag | V176_missing_flag | V177_missing_flag | V178_missing_flag | V179_missing_flag | V180_missing_flag | V181_missing_flag | V182_missing_flag | V183_missing_flag | V184_missing_flag | V185_missing_flag | V186_missing_flag | V187_missing_flag | V188_missing_flag | V189_missing_flag | V190_missing_flag | V191_missing_flag | V192_missing_flag | V193_missing_flag | V194_missing_flag | V195_missing_flag | V196_missing_flag | V197_missing_flag | V198_missing_flag | V199_missing_flag | V200_missing_flag | V201_missing_flag | V202_missing_flag | V203_missing_flag | V204_missing_flag | V205_missing_flag | V206_missing_flag | V207_missing_flag | V208_missing_flag | V209_missing_flag | V210_missing_flag | V211_missing_flag | V212_missing_flag | V213_missing_flag | V214_missing_flag | V215_missing_flag | V216_missing_flag | V217_missing_flag | V218_missing_flag | V219_missing_flag | V220_missing_flag | V221_missing_flag | V222_missing_flag | V223_missing_flag | V224_missing_flag | V225_missing_flag | V226_missing_flag | V227_missing_flag | V228_missing_flag | V229_missing_flag | V230_missing_flag | V231_missing_flag | V232_missing_flag | V233_missing_flag | V234_missing_flag | V235_missing_flag | V236_missing_flag | V237_missing_flag | V238_missing_flag | V239_missing_flag | V240_missing_flag | V241_missing_flag | V242_missing_flag | V243_missing_flag | V244_missing_flag | V245_missing_flag | V246_missing_flag | V247_missing_flag | V248_missing_flag | V249_missing_flag | V250_missing_flag | V251_missing_flag | V252_missing_flag | V253_missing_flag | V254_missing_flag | V255_missing_flag | V256_missing_flag | V257_missing_flag | V258_missing_flag | V259_missing_flag | V260_missing_flag | V261_missing_flag | V262_missing_flag | V263_missing_flag | V264_missing_flag | V265_missing_flag | V266_missing_flag | V267_missing_flag | V268_missing_flag | V269_missing_flag | V270_missing_flag | V271_missing_flag | V272_missing_flag | V273_missing_flag | V274_missing_flag | V275_missing_flag | V276_missing_flag | V277_missing_flag | V278_missing_flag | V279_missing_flag | V280_missing_flag | V281_missing_flag | V282_missing_flag | V283_missing_flag | V284_missing_flag | V285_missing_flag | V286_missing_flag | V287_missing_flag | V288_missing_flag | V289_missing_flag | V290_missing_flag | V291_missing_flag | V292_missing_flag | V293_missing_flag | V294_missing_flag | V295_missing_flag | V296_missing_flag | V297_missing_flag | V298_missing_flag | V299_missing_flag | V300_missing_flag | V301_missing_flag | V302_missing_flag | V303_missing_flag | V304_missing_flag | V305_missing_flag | V306_missing_flag | V307_missing_flag | V308_missing_flag | V309_missing_flag | V310_missing_flag | V311_missing_flag | V312_missing_flag | V313_missing_flag | V314_missing_flag | V315_missing_flag | V316_missing_flag | V317_missing_flag | V318_missing_flag | V319_missing_flag | V320_missing_flag | V321_missing_flag | V322_missing_flag | V323_missing_flag | V324_missing_flag | V325_missing_flag | V326_missing_flag | V327_missing_flag | V328_missing_flag | V329_missing_flag | V330_missing_flag | V331_missing_flag | V332_missing_flag | V333_missing_flag | V334_missing_flag | V335_missing_flag | V336_missing_flag | V337_missing_flag | V338_missing_flag | V339_missing_flag | id_01_missing_flag | id_02_missing_flag | id_03_missing_flag | id_04_missing_flag | id_05_missing_flag | id_06_missing_flag | id_07_missing_flag | id_08_missing_flag | id_09_missing_flag | id_10_missing_flag | id_11_missing_flag | id_12_missing_flag | id_13_missing_flag | id_14_missing_flag | id_15_missing_flag | id_16_missing_flag | id_17_missing_flag | id_18_missing_flag | id_19_missing_flag | id_20_missing_flag | id_21_missing_flag | id_22_missing_flag | id_23_missing_flag | id_24_missing_flag | id_25_missing_flag | id_26_missing_flag | id_27_missing_flag | id_28_missing_flag | id_29_missing_flag | id_30_missing_flag | id_31_missing_flag | id_32_missing_flag | id_33_missing_flag | id_34_missing_flag | id_35_missing_flag | id_36_missing_flag | id_37_missing_flag | id_38_missing_flag | DeviceType_missing_flag | DeviceInfo_missing_flag | Date | _Weekdays | _Hours | _Days | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2987000 | 0 | 86400 | 68.5 | W | 13926 | NaN | 150.0 | discover | 142.0 | credit | 315.0 | 87.0 | 19.0 | NoInf | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | 1.0 | 1.0 | 14.0 | NaN | 13.0 | NaN | NaN | NaN | NaN | NaN | 13.0 | 13.0 | NaN | NaN | NaN | 0.0 | T | T | T | M2 | F | T | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | True | True | True | True | True | True | False | False | True | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 2017-12-02 00:00:00 | 5 | 0 | 2 |
| 1 | 2987001 | 0 | 86401 | 29.0 | W | 2755 | 404.0 | 150.0 | mastercard | 102.0 | credit | 325.0 | 87.0 | NaN | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | M0 | T | T | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 2017-12-02 00:00:01 | 5 | 0 | 2 | |
| 2 | 2987002 | 0 | 86469 | 59.0 | W | 4663 | 490.0 | 150.0 | visa | 166.0 | debit | 330.0 | 87.0 | 287.0 | Microsoft | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | 315.0 | NaN | NaN | NaN | 315.0 | T | T | T | M0 | F | F | F | F | F | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | True | True | False | False | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | False | 2017-12-02 00:01:09 | 5 | 0 | 2 |
| 3 | 2987003 | 0 | 86499 | 50.0 | W | 18132 | 567.0 | 150.0 | mastercard | 117.0 | debit | 476.0 | 87.0 | NaN | Yahoo Mail | NoInf | 2.0 | 5.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 25.0 | 1.0 | 112.0 | 112.0 | 0.0 | 94.0 | 0.0 | NaN | NaN | NaN | 84.0 | NaN | NaN | NaN | NaN | 111.0 | NaN | NaN | NaN | M0 | T | F | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 48.0 | 28.0 | 0.0 | 10.0 | 4.0 | 1.0 | 38.0 | 24.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 50.0 | 1758.0 | 925.0 | 0.0 | 354.0 | 135.0 | 50.0 | 1404.0 | 790.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | True | 2017-12-02 00:01:39 | 5 | 0 | 2 |
| 4 | 2987004 | 0 | 86506 | 50.0 | H | 4497 | 514.0 | 150.0 | mastercard | 102.0 | credit | 420.0 | 87.0 | NaN | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 18.0 | 140.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1803.0 | 49.0 | 64.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 15560.0 | 169690.796875 | 0.0 | 0.0 | 0.0 | 515.0 | 5155.0 | 2840.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | False | False | False | False | True | False | False | False | False | False | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 2017-12-02 00:01:46 | 5 | 0 | 2 |
5 rows × 840 columns
You need to engineer the domain specific features. This might boost up the predictive power. This often gives better performing models
Domain knowledge is one of the key pillars of data science. So always understand the domain before attempting the problem.
# Transaction amount minus mean of transaction
df['Trans_min_mean'] = df['TransactionAmt'] - np.nanmean(df['TransactionAmt'],dtype="float64")
df['Trans_min_std'] = df['Trans_min_mean'] / np.nanstd(df['TransactionAmt'].astype("float64"),dtype="float64")
Replace value by the group's mean (or standard dev)
# Features for transaction amount and card
df['TransactionAmt_to_mean_card1'] = df['TransactionAmt'] / df.groupby(['card1'])['TransactionAmt'].transform('mean')
df['TransactionAmt_to_mean_card4'] = df['TransactionAmt'] / df.groupby(['card4'])['TransactionAmt'].transform('mean')
df['TransactionAmt_to_std_card1'] = df['TransactionAmt'] / df.groupby(['card1'])['TransactionAmt'].transform('std')
df['TransactionAmt_to_std_card4'] = df['TransactionAmt'] / df.groupby(['card4'])['TransactionAmt'].transform('std')
# Log of transaction amount
df['TransactionAmt'] = np.log(df['TransactionAmt'])
df.head()
| TransactionID | isFraud | TransactionDT | TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | V29 | V30 | V31 | V32 | V33 | V34 | V35 | V36 | V37 | V38 | V39 | V40 | V41 | V42 | V43 | V44 | V45 | V46 | V47 | V48 | V49 | V50 | V51 | V52 | V53 | V54 | V55 | V56 | V57 | V58 | V59 | V60 | V61 | V62 | V63 | V64 | V65 | V66 | V67 | V68 | V69 | V70 | V71 | V72 | V73 | V74 | V75 | V76 | V77 | V78 | V79 | V80 | V81 | V82 | V83 | V84 | V85 | V86 | V87 | V88 | V89 | V90 | V91 | V92 | V93 | V94 | V95 | V96 | V97 | V98 | V99 | V100 | V101 | V102 | V103 | V104 | V105 | V106 | V107 | V108 | V109 | V110 | V111 | V112 | V113 | V114 | V115 | V116 | V117 | V118 | V119 | V120 | V121 | V122 | V123 | V124 | V125 | V126 | V127 | V128 | V129 | V130 | V131 | V132 | V133 | V134 | V135 | V136 | V137 | V138 | V139 | V140 | V141 | V142 | V143 | V144 | V145 | V146 | V147 | V148 | V149 | V150 | V151 | V152 | V153 | V154 | V155 | V156 | V157 | V158 | V159 | V160 | V161 | V162 | V163 | V164 | V165 | V166 | V167 | V168 | V169 | V170 | V171 | V172 | V173 | V174 | V175 | V176 | V177 | V178 | V179 | V180 | V181 | V182 | V183 | V184 | V185 | V186 | V187 | V188 | V189 | V190 | V191 | V192 | V193 | V194 | V195 | V196 | V197 | ... | V140_missing_flag | V141_missing_flag | V142_missing_flag | V143_missing_flag | V144_missing_flag | V145_missing_flag | V146_missing_flag | V147_missing_flag | V148_missing_flag | V149_missing_flag | V150_missing_flag | V151_missing_flag | V152_missing_flag | V153_missing_flag | V154_missing_flag | V155_missing_flag | V156_missing_flag | V157_missing_flag | V158_missing_flag | V159_missing_flag | V160_missing_flag | V161_missing_flag | V162_missing_flag | V163_missing_flag | V164_missing_flag | V165_missing_flag | V166_missing_flag | V167_missing_flag | V168_missing_flag | V169_missing_flag | V170_missing_flag | V171_missing_flag | V172_missing_flag | V173_missing_flag | V174_missing_flag | V175_missing_flag | V176_missing_flag | V177_missing_flag | V178_missing_flag | V179_missing_flag | V180_missing_flag | V181_missing_flag | V182_missing_flag | V183_missing_flag | V184_missing_flag | V185_missing_flag | V186_missing_flag | V187_missing_flag | V188_missing_flag | V189_missing_flag | V190_missing_flag | V191_missing_flag | V192_missing_flag | V193_missing_flag | V194_missing_flag | V195_missing_flag | V196_missing_flag | V197_missing_flag | V198_missing_flag | V199_missing_flag | V200_missing_flag | V201_missing_flag | V202_missing_flag | V203_missing_flag | V204_missing_flag | V205_missing_flag | V206_missing_flag | V207_missing_flag | V208_missing_flag | V209_missing_flag | V210_missing_flag | V211_missing_flag | V212_missing_flag | V213_missing_flag | V214_missing_flag | V215_missing_flag | V216_missing_flag | V217_missing_flag | V218_missing_flag | V219_missing_flag | V220_missing_flag | V221_missing_flag | V222_missing_flag | V223_missing_flag | V224_missing_flag | V225_missing_flag | V226_missing_flag | V227_missing_flag | V228_missing_flag | V229_missing_flag | V230_missing_flag | V231_missing_flag | V232_missing_flag | V233_missing_flag | V234_missing_flag | V235_missing_flag | V236_missing_flag | V237_missing_flag | V238_missing_flag | V239_missing_flag | V240_missing_flag | V241_missing_flag | V242_missing_flag | V243_missing_flag | V244_missing_flag | V245_missing_flag | V246_missing_flag | V247_missing_flag | V248_missing_flag | V249_missing_flag | V250_missing_flag | V251_missing_flag | V252_missing_flag | V253_missing_flag | V254_missing_flag | V255_missing_flag | V256_missing_flag | V257_missing_flag | V258_missing_flag | V259_missing_flag | V260_missing_flag | V261_missing_flag | V262_missing_flag | V263_missing_flag | V264_missing_flag | V265_missing_flag | V266_missing_flag | V267_missing_flag | V268_missing_flag | V269_missing_flag | V270_missing_flag | V271_missing_flag | V272_missing_flag | V273_missing_flag | V274_missing_flag | V275_missing_flag | V276_missing_flag | V277_missing_flag | V278_missing_flag | V279_missing_flag | V280_missing_flag | V281_missing_flag | V282_missing_flag | V283_missing_flag | V284_missing_flag | V285_missing_flag | V286_missing_flag | V287_missing_flag | V288_missing_flag | V289_missing_flag | V290_missing_flag | V291_missing_flag | V292_missing_flag | V293_missing_flag | V294_missing_flag | V295_missing_flag | V296_missing_flag | V297_missing_flag | V298_missing_flag | V299_missing_flag | V300_missing_flag | V301_missing_flag | V302_missing_flag | V303_missing_flag | V304_missing_flag | V305_missing_flag | V306_missing_flag | V307_missing_flag | V308_missing_flag | V309_missing_flag | V310_missing_flag | V311_missing_flag | V312_missing_flag | V313_missing_flag | V314_missing_flag | V315_missing_flag | V316_missing_flag | V317_missing_flag | V318_missing_flag | V319_missing_flag | V320_missing_flag | V321_missing_flag | V322_missing_flag | V323_missing_flag | V324_missing_flag | V325_missing_flag | V326_missing_flag | V327_missing_flag | V328_missing_flag | V329_missing_flag | V330_missing_flag | V331_missing_flag | V332_missing_flag | V333_missing_flag | V334_missing_flag | V335_missing_flag | V336_missing_flag | V337_missing_flag | V338_missing_flag | V339_missing_flag | id_01_missing_flag | id_02_missing_flag | id_03_missing_flag | id_04_missing_flag | id_05_missing_flag | id_06_missing_flag | id_07_missing_flag | id_08_missing_flag | id_09_missing_flag | id_10_missing_flag | id_11_missing_flag | id_12_missing_flag | id_13_missing_flag | id_14_missing_flag | id_15_missing_flag | id_16_missing_flag | id_17_missing_flag | id_18_missing_flag | id_19_missing_flag | id_20_missing_flag | id_21_missing_flag | id_22_missing_flag | id_23_missing_flag | id_24_missing_flag | id_25_missing_flag | id_26_missing_flag | id_27_missing_flag | id_28_missing_flag | id_29_missing_flag | id_30_missing_flag | id_31_missing_flag | id_32_missing_flag | id_33_missing_flag | id_34_missing_flag | id_35_missing_flag | id_36_missing_flag | id_37_missing_flag | id_38_missing_flag | DeviceType_missing_flag | DeviceInfo_missing_flag | Date | _Weekdays | _Hours | _Days | Trans_min_mean | Trans_min_std | TransactionAmt_to_mean_card1 | TransactionAmt_to_mean_card4 | TransactionAmt_to_std_card1 | TransactionAmt_to_std_card4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2987000 | 0 | 86400 | 4.226834 | W | 13926 | NaN | 150.0 | discover | 142.0 | credit | 315.0 | 87.0 | 19.0 | NoInf | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | 1.0 | 1.0 | 14.0 | NaN | 13.0 | NaN | NaN | NaN | NaN | NaN | 13.0 | 13.0 | NaN | NaN | NaN | 0.0 | T | T | T | M2 | F | T | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | True | True | True | True | True | True | False | False | True | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 2017-12-02 00:00:00 | 5 | 0 | 2 | -66.527347 | -0.278174 | 0.194638 | 0.257761 | 0.184560 | 0.170241 |
| 1 | 2987001 | 0 | 86401 | 3.367296 | W | 2755 | 404.0 | 150.0 | mastercard | 102.0 | credit | 325.0 | 87.0 | NaN | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | M0 | T | T | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 2017-12-02 00:00:01 | 5 | 0 | 2 | -106.027347 | -0.443337 | 0.123780 | 0.219053 | 0.063004 | 0.114214 | |
| 2 | 2987002 | 0 | 86469 | 4.077537 | W | 4663 | 490.0 | 150.0 | visa | 166.0 | debit | 330.0 | 87.0 | 287.0 | Microsoft | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | 315.0 | NaN | NaN | NaN | 315.0 | T | T | T | M0 | F | F | F | F | F | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | True | True | False | False | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | False | 2017-12-02 00:01:09 | 5 | 0 | 2 | -76.027347 | -0.317897 | 0.608151 | 0.443070 | 0.589226 | 0.258550 |
| 3 | 2987003 | 0 | 86499 | 3.912023 | W | 18132 | 567.0 | 150.0 | mastercard | 117.0 | debit | 476.0 | 87.0 | NaN | Yahoo Mail | NoInf | 2.0 | 5.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 25.0 | 1.0 | 112.0 | 112.0 | 0.0 | 94.0 | 0.0 | NaN | NaN | NaN | 84.0 | NaN | NaN | NaN | NaN | 111.0 | NaN | NaN | NaN | M0 | T | F | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 48.0 | 28.0 | 0.0 | 10.0 | 4.0 | 1.0 | 38.0 | 24.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 50.0 | 1758.0 | 925.0 | 0.0 | 354.0 | 135.0 | 50.0 | 1404.0 | 790.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | True | 2017-12-02 00:01:39 | 5 | 0 | 2 | -85.027347 | -0.355529 | 0.405136 | 0.377678 | 0.259460 | 0.196921 |
| 4 | 2987004 | 0 | 86506 | 3.912023 | H | 4497 | 514.0 | 150.0 | mastercard | 102.0 | credit | 420.0 | 87.0 | NaN | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 18.0 | 140.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1803.0 | 49.0 | 64.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 15560.0 | 169690.796875 | 0.0 | 0.0 | 0.0 | 515.0 | 5155.0 | 2840.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | False | False | False | False | True | False | False | False | False | False | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 2017-12-02 00:01:46 | 5 | 0 | 2 | -85.027347 | -0.355529 | 0.515616 | 0.377678 | 0.882898 | 0.196921 |
5 rows × 846 columns
# Save train df to csv file
# df.to_csv("Intermediate_Datasets/df_intermediate1.csv",index = False)
# Read train df
df = pd.read_csv("Intermediate_Datasets/df_intermediate1.csv")
When dealing with high dimensional data, it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the “essence” of the data.
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension.
Principal component analysis is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance.
# initialize function to perform PCA
def perform_PCA(df, cols, n_components, prefix='PCA_', rand_seed=4):
pca = PCA(n_components=n_components, random_state=rand_seed)
principalComponents = pca.fit_transform(df[cols])
principalDf = pd.DataFrame(principalComponents)
df.drop(cols, axis=1, inplace=True)
principalDf.rename(columns=lambda x: str(prefix)+str(x), inplace=True)
df = pd.concat([df, principalDf], axis=1)
return df
Create a list of all the columns on which PCA needs to performed
# Columns starting from V1 to V339
filter_col = df.columns[53:392]
Impute missing values in the mas_v columns, later use minmax_scale function to scale the values in these columns
from sklearn.preprocessing import minmax_scale
# Fill na values and scale V columns
for col in filter_col:
df[col] = df[col].fillna((df[col].min() - 2))
df[col] = (minmax_scale(df[col], feature_range=(0,1)))
# Perform PCA
df = perform_PCA(df, filter_col, prefix='PCA_V_', n_components=30)
Reduce memory usage of df as lot of new features have been created
df = reduce_mem_usage(df)
Mem. usage decreased to 1138.99 Mb (21.4% reduction)
df.head()
| TransactionID | isFraud | TransactionDT | TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | id_01 | id_02 | id_03 | id_04 | id_05 | id_06 | id_09 | id_10 | id_11 | id_12 | id_13 | id_14 | id_15 | id_16 | id_17 | id_19 | id_20 | id_28 | id_29 | id_30 | id_31 | id_32 | id_33 | id_34 | id_35 | id_36 | id_37 | id_38 | DeviceType | DeviceInfo | card2_missing_flag | card3_missing_flag | card4_missing_flag | card5_missing_flag | card6_missing_flag | addr1_missing_flag | addr2_missing_flag | dist1_missing_flag | dist2_missing_flag | P_emaildomain_missing_flag | R_emaildomain_missing_flag | D1_missing_flag | D2_missing_flag | D3_missing_flag | D4_missing_flag | D5_missing_flag | D6_missing_flag | D7_missing_flag | D8_missing_flag | D9_missing_flag | D10_missing_flag | D11_missing_flag | D12_missing_flag | D13_missing_flag | D14_missing_flag | D15_missing_flag | M1_missing_flag | M2_missing_flag | M3_missing_flag | M4_missing_flag | M5_missing_flag | M6_missing_flag | M7_missing_flag | M8_missing_flag | M9_missing_flag | V1_missing_flag | V2_missing_flag | V3_missing_flag | V4_missing_flag | V5_missing_flag | V6_missing_flag | V7_missing_flag | V8_missing_flag | V9_missing_flag | V10_missing_flag | V11_missing_flag | V12_missing_flag | V13_missing_flag | V14_missing_flag | V15_missing_flag | V16_missing_flag | V17_missing_flag | V18_missing_flag | V19_missing_flag | V20_missing_flag | V21_missing_flag | V22_missing_flag | V23_missing_flag | V24_missing_flag | V25_missing_flag | V26_missing_flag | V27_missing_flag | V28_missing_flag | V29_missing_flag | V30_missing_flag | V31_missing_flag | V32_missing_flag | V33_missing_flag | V34_missing_flag | V35_missing_flag | V36_missing_flag | V37_missing_flag | V38_missing_flag | V39_missing_flag | V40_missing_flag | V41_missing_flag | V42_missing_flag | V43_missing_flag | V44_missing_flag | V45_missing_flag | V46_missing_flag | V47_missing_flag | V48_missing_flag | V49_missing_flag | V50_missing_flag | V51_missing_flag | V52_missing_flag | V53_missing_flag | V54_missing_flag | V55_missing_flag | V56_missing_flag | V57_missing_flag | V58_missing_flag | V59_missing_flag | V60_missing_flag | V61_missing_flag | V62_missing_flag | V63_missing_flag | V64_missing_flag | V65_missing_flag | V66_missing_flag | V67_missing_flag | V68_missing_flag | V69_missing_flag | V70_missing_flag | V71_missing_flag | V72_missing_flag | V73_missing_flag | V74_missing_flag | V75_missing_flag | V76_missing_flag | V77_missing_flag | V78_missing_flag | V79_missing_flag | V80_missing_flag | V81_missing_flag | V82_missing_flag | V83_missing_flag | V84_missing_flag | V85_missing_flag | V86_missing_flag | V87_missing_flag | V88_missing_flag | V89_missing_flag | V90_missing_flag | V91_missing_flag | V92_missing_flag | V93_missing_flag | V94_missing_flag | V95_missing_flag | V96_missing_flag | V97_missing_flag | V98_missing_flag | V99_missing_flag | V100_missing_flag | V101_missing_flag | V102_missing_flag | V103_missing_flag | V104_missing_flag | V105_missing_flag | V106_missing_flag | V107_missing_flag | V108_missing_flag | V109_missing_flag | V110_missing_flag | V111_missing_flag | V112_missing_flag | V113_missing_flag | V114_missing_flag | V115_missing_flag | V116_missing_flag | V117_missing_flag | V118_missing_flag | V119_missing_flag | V120_missing_flag | V121_missing_flag | V122_missing_flag | V123_missing_flag | V124_missing_flag | V125_missing_flag | V126_missing_flag | V127_missing_flag | V128_missing_flag | V129_missing_flag | V130_missing_flag | V131_missing_flag | V132_missing_flag | ... | V170_missing_flag | V171_missing_flag | V172_missing_flag | V173_missing_flag | V174_missing_flag | V175_missing_flag | V176_missing_flag | V177_missing_flag | V178_missing_flag | V179_missing_flag | V180_missing_flag | V181_missing_flag | V182_missing_flag | V183_missing_flag | V184_missing_flag | V185_missing_flag | V186_missing_flag | V187_missing_flag | V188_missing_flag | V189_missing_flag | V190_missing_flag | V191_missing_flag | V192_missing_flag | V193_missing_flag | V194_missing_flag | V195_missing_flag | V196_missing_flag | V197_missing_flag | V198_missing_flag | V199_missing_flag | V200_missing_flag | V201_missing_flag | V202_missing_flag | V203_missing_flag | V204_missing_flag | V205_missing_flag | V206_missing_flag | V207_missing_flag | V208_missing_flag | V209_missing_flag | V210_missing_flag | V211_missing_flag | V212_missing_flag | V213_missing_flag | V214_missing_flag | V215_missing_flag | V216_missing_flag | V217_missing_flag | V218_missing_flag | V219_missing_flag | V220_missing_flag | V221_missing_flag | V222_missing_flag | V223_missing_flag | V224_missing_flag | V225_missing_flag | V226_missing_flag | V227_missing_flag | V228_missing_flag | V229_missing_flag | V230_missing_flag | V231_missing_flag | V232_missing_flag | V233_missing_flag | V234_missing_flag | V235_missing_flag | V236_missing_flag | V237_missing_flag | V238_missing_flag | V239_missing_flag | V240_missing_flag | V241_missing_flag | V242_missing_flag | V243_missing_flag | V244_missing_flag | V245_missing_flag | V246_missing_flag | V247_missing_flag | V248_missing_flag | V249_missing_flag | V250_missing_flag | V251_missing_flag | V252_missing_flag | V253_missing_flag | V254_missing_flag | V255_missing_flag | V256_missing_flag | V257_missing_flag | V258_missing_flag | V259_missing_flag | V260_missing_flag | V261_missing_flag | V262_missing_flag | V263_missing_flag | V264_missing_flag | V265_missing_flag | V266_missing_flag | V267_missing_flag | V268_missing_flag | V269_missing_flag | V270_missing_flag | V271_missing_flag | V272_missing_flag | V273_missing_flag | V274_missing_flag | V275_missing_flag | V276_missing_flag | V277_missing_flag | V278_missing_flag | V279_missing_flag | V280_missing_flag | V281_missing_flag | V282_missing_flag | V283_missing_flag | V284_missing_flag | V285_missing_flag | V286_missing_flag | V287_missing_flag | V288_missing_flag | V289_missing_flag | V290_missing_flag | V291_missing_flag | V292_missing_flag | V293_missing_flag | V294_missing_flag | V295_missing_flag | V296_missing_flag | V297_missing_flag | V298_missing_flag | V299_missing_flag | V300_missing_flag | V301_missing_flag | V302_missing_flag | V303_missing_flag | V304_missing_flag | V305_missing_flag | V306_missing_flag | V307_missing_flag | V308_missing_flag | V309_missing_flag | V310_missing_flag | V311_missing_flag | V312_missing_flag | V313_missing_flag | V314_missing_flag | V315_missing_flag | V316_missing_flag | V317_missing_flag | V318_missing_flag | V319_missing_flag | V320_missing_flag | V321_missing_flag | V322_missing_flag | V323_missing_flag | V324_missing_flag | V325_missing_flag | V326_missing_flag | V327_missing_flag | V328_missing_flag | V329_missing_flag | V330_missing_flag | V331_missing_flag | V332_missing_flag | V333_missing_flag | V334_missing_flag | V335_missing_flag | V336_missing_flag | V337_missing_flag | V338_missing_flag | V339_missing_flag | id_01_missing_flag | id_02_missing_flag | id_03_missing_flag | id_04_missing_flag | id_05_missing_flag | id_06_missing_flag | id_07_missing_flag | id_08_missing_flag | id_09_missing_flag | id_10_missing_flag | id_11_missing_flag | id_12_missing_flag | id_13_missing_flag | id_14_missing_flag | id_15_missing_flag | id_16_missing_flag | id_17_missing_flag | id_18_missing_flag | id_19_missing_flag | id_20_missing_flag | id_21_missing_flag | id_22_missing_flag | id_23_missing_flag | id_24_missing_flag | id_25_missing_flag | id_26_missing_flag | id_27_missing_flag | id_28_missing_flag | id_29_missing_flag | id_30_missing_flag | id_31_missing_flag | id_32_missing_flag | id_33_missing_flag | id_34_missing_flag | id_35_missing_flag | id_36_missing_flag | id_37_missing_flag | id_38_missing_flag | DeviceType_missing_flag | DeviceInfo_missing_flag | Date | _Weekdays | _Hours | _Days | Trans_min_mean | Trans_min_std | TransactionAmt_to_mean_card1 | TransactionAmt_to_mean_card4 | TransactionAmt_to_std_card1 | TransactionAmt_to_std_card4 | PCA_V_0 | PCA_V_1 | PCA_V_2 | PCA_V_3 | PCA_V_4 | PCA_V_5 | PCA_V_6 | PCA_V_7 | PCA_V_8 | PCA_V_9 | PCA_V_10 | PCA_V_11 | PCA_V_12 | PCA_V_13 | PCA_V_14 | PCA_V_15 | PCA_V_16 | PCA_V_17 | PCA_V_18 | PCA_V_19 | PCA_V_20 | PCA_V_21 | PCA_V_22 | PCA_V_23 | PCA_V_24 | PCA_V_25 | PCA_V_26 | PCA_V_27 | PCA_V_28 | PCA_V_29 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2987000 | 0 | 86400 | 4.226562 | W | 13926 | NaN | 150.0 | discover | 142.0 | credit | 315.0 | 87.0 | 19.0 | NoInf | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | 1.0 | 1.0 | 14.0 | NaN | 13.0 | NaN | NaN | NaN | NaN | NaN | 13.0 | 13.0 | NaN | NaN | NaN | 0.0 | T | T | T | M2 | F | T | NaN | NaN | NaN | 0.0 | 70787.0 | NaN | NaN | NaN | NaN | NaN | NaN | 100.0 | NotFound | NaN | -480.0 | New | NotFound | 166.0 | 542.0 | 144.0 | New | NotFound | Android | Samsung | 32.0 | 2220x1080 | match_status:2 | T | F | T | T | mobile | SAMSUNG SM-G892A Build/NRD90M | True | False | False | False | False | False | False | False | True | True | True | False | True | False | True | True | True | True | True | True | False | False | True | True | True | False | False | False | False | False | False | False | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | True | True | True | True | True | True | False | False | True | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 2017-12-02 00:00:00 | 5 | 0 | 2 | -66.5 | -0.278076 | 0.194580 | 0.257812 | 0.184560 | 0.170288 | -0.157349 | 0.919434 | -0.843750 | 0.308105 | -0.089417 | 0.003044 | -0.020050 | -0.187622 | 0.038208 | 0.002604 | -0.010536 | 0.034058 | -0.044434 | -0.089722 | 0.044769 | 0.001550 | -0.003441 | 0.018616 | -0.018387 | 0.010078 | -0.026947 | -0.021362 | -0.054626 | 0.025375 | 0.018814 | -0.006039 | 0.004055 | -0.043335 | 0.008026 | -0.007957 |
| 1 | 2987001 | 0 | 86401 | 3.367188 | W | 2755 | 404.0 | 150.0 | mastercard | 102.0 | credit | 325.0 | 87.0 | NaN | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | M0 | T | T | NaN | NaN | NaN | -5.0 | 98945.0 | NaN | NaN | 0.0 | -5.0 | NaN | NaN | 100.0 | NotFound | 49.0 | -300.0 | New | NotFound | 166.0 | 621.0 | 500.0 | New | NotFound | iOS | Safari | 32.0 | 1334x750 | match_status:1 | T | F | F | T | mobile | iOS Device | False | False | False | False | False | False | False | True | True | False | True | False | True | True | False | True | True | True | True | True | False | True | True | True | True | False | True | True | True | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 2017-12-02 00:00:01 | 5 | 0 | 2 | -106.0 | -0.443359 | 0.123779 | 0.218994 | 0.063004 | 0.114197 | -0.086365 | -0.800293 | -0.152344 | -0.363525 | -0.101868 | -0.002291 | 0.032318 | -0.068848 | 0.040222 | -0.180176 | -0.059387 | 0.002302 | 0.018982 | -0.029556 | 0.016647 | -0.006241 | -0.004208 | 0.010170 | -0.001647 | -0.022919 | 0.006298 | -0.021164 | 0.054626 | -0.042542 | -0.026794 | 0.003531 | 0.001647 | 0.001576 | -0.003611 | -0.003761 | |
| 2 | 2987002 | 0 | 86469 | 4.078125 | W | 4663 | 490.0 | 150.0 | visa | 166.0 | debit | 330.0 | 87.0 | 287.0 | Microsoft | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | 315.0 | NaN | NaN | NaN | 315.0 | T | T | T | M0 | F | F | F | F | F | -5.0 | 191631.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 100.0 | NotFound | 52.0 | NaN | Found | Found | 121.0 | 410.0 | 142.0 | Found | Found | NAN | Chrome | NaN | NaN | NaN | F | F | T | T | desktop | Windows | False | False | False | False | False | False | False | False | True | False | True | False | True | True | False | True | True | True | True | True | False | False | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | True | True | False | False | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | False | 2017-12-02 00:01:09 | 5 | 0 | 2 | -76.0 | -0.317871 | 0.607910 | 0.443115 | 0.589226 | 0.258545 | -0.800781 | 0.316895 | 0.273193 | -0.026352 | 0.043182 | -0.008064 | -0.039276 | -0.217041 | 0.017715 | 0.033508 | -0.000322 | -0.015343 | 0.020676 | -0.046051 | -0.006725 | 0.004875 | 0.001104 | 0.007484 | -0.007793 | -0.006611 | -0.008270 | 0.009857 | -0.007710 | 0.003191 | 0.002834 | 0.001886 | 0.003839 | 0.002903 | -0.019592 | -0.003424 |
| 3 | 2987003 | 0 | 86499 | 3.912109 | W | 18132 | 567.0 | 150.0 | mastercard | 117.0 | debit | 476.0 | 87.0 | NaN | Yahoo Mail | NoInf | 2.0 | 5.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 25.0 | 1.0 | 112.0 | 112.0 | 0.0 | 94.0 | 0.0 | NaN | NaN | NaN | 84.0 | NaN | NaN | NaN | NaN | 111.0 | NaN | NaN | NaN | M0 | T | F | NaN | NaN | NaN | -5.0 | 221832.0 | NaN | NaN | 0.0 | -6.0 | NaN | NaN | 100.0 | NotFound | 52.0 | NaN | New | NotFound | 225.0 | 176.0 | 507.0 | New | NotFound | NAN | Chrome | NaN | NaN | NaN | F | F | T | T | desktop | NaN | False | False | False | False | False | False | False | True | True | False | True | False | False | False | False | False | True | True | True | True | False | True | True | True | True | False | True | True | True | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | True | 2017-12-02 00:01:39 | 5 | 0 | 2 | -85.0 | -0.355469 | 0.405029 | 0.377686 | 0.259460 | 0.196899 | -0.237427 | -0.811523 | -0.123657 | -0.423828 | -0.067261 | 0.025040 | 0.110413 | -0.253906 | 0.004803 | 0.170410 | -0.012550 | -0.014488 | 0.005268 | 0.031891 | -0.013489 | -0.017319 | -0.001888 | -0.084717 | 0.050293 | 0.140747 | 0.058960 | -0.020218 | 0.066589 | -0.010910 | -0.017868 | 0.025528 | 0.003674 | 0.003511 | 0.026047 | -0.041962 |
| 4 | 2987004 | 0 | 86506 | 3.912109 | H | 4497 | 514.0 | 150.0 | mastercard | 102.0 | credit | 420.0 | 87.0 | NaN | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 7460.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 100.0 | NotFound | NaN | -300.0 | Found | Found | 166.0 | 529.0 | 575.0 | Found | Found | Mac | Chrome | 24.0 | 1280x800 | match_status:2 | T | F | T | T | desktop | MacOS | False | False | False | False | False | False | False | True | True | False | True | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | False | False | False | False | True | False | False | False | False | False | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 2017-12-02 00:01:46 | 5 | 0 | 2 | -85.0 | -0.355469 | 0.515625 | 0.377686 | 0.882898 | 0.196899 | 2.904297 | 0.380127 | 0.480713 | -0.009659 | -0.171753 | 1.170898 | -0.178223 | 0.004517 | 0.043793 | -0.001614 | 0.015610 | -0.017883 | -0.020523 | -0.005604 | -0.010262 | -0.003822 | -0.011459 | -0.008179 | 0.013390 | 0.017792 | -0.013702 | 0.000093 | -0.005245 | -0.142334 | 0.202271 | 0.014458 | 0.012764 | 0.002150 | 0.014008 | -0.001770 |
5 rows × 537 columns
# Plot first 2 PCA features and colour by target variable
plt.figure(figsize=(12, 8));
groups = df.groupby("isFraud")
for name, group in groups:
plt.scatter(group["PCA_V_0"], group["PCA_V_1"], label=name)
plt.legend()
plt.show()
Encoding is the process of converting data from one form to another. Most of the Machine learning algorithms can not handle categorical values unless we convert them to numerical values. Many algorithm’s performances vary based on how Categorical columns are encoded.
Create a list of variables that needs to be encoded using frequency encoding. Let's note down the features which has more than 30 unique values, We would using frequency encoding for these features only
cat_columns = df.select_dtypes(include=['object']).columns
len(cat_columns)
30
binary_columns = [col for col in df.columns if df[col].nunique() == 2]
len(binary_columns)
432
num_columns = [col for col in df.columns if (col not in cat_columns) & (col not in binary_columns)]
len(num_columns)
92
cat_columns = cat_columns.to_list() + binary_columns
# Frequecny encoding variables
frequency_encoded_variables = []
for col in cat_columns:
if df[col].nunique() > 30:
print(col, df[col].nunique())
frequency_encoded_variables.append(col)
id_33 260 DeviceInfo 1786 Date 573349
It's time to encode the variables using frequency encoding
# Frequecny enocde the variables
for variable in tqdm(frequency_encoded_variables):
# group by frequency
fq = df.groupby(variable).size()/len(df)
# mapping values to dataframe
df.loc[:, "{}".format(variable)] = df[variable].map(fq)
cat_columns.remove(variable)
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.18it/s]
df.head()
| TransactionID | isFraud | TransactionDT | TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | id_01 | id_02 | id_03 | id_04 | id_05 | id_06 | id_09 | id_10 | id_11 | id_12 | id_13 | id_14 | id_15 | id_16 | id_17 | id_19 | id_20 | id_28 | id_29 | id_30 | id_31 | id_32 | id_33 | id_34 | id_35 | id_36 | id_37 | id_38 | DeviceType | DeviceInfo | card2_missing_flag | card3_missing_flag | card4_missing_flag | card5_missing_flag | card6_missing_flag | addr1_missing_flag | addr2_missing_flag | dist1_missing_flag | dist2_missing_flag | P_emaildomain_missing_flag | R_emaildomain_missing_flag | D1_missing_flag | D2_missing_flag | D3_missing_flag | D4_missing_flag | D5_missing_flag | D6_missing_flag | D7_missing_flag | D8_missing_flag | D9_missing_flag | D10_missing_flag | D11_missing_flag | D12_missing_flag | D13_missing_flag | D14_missing_flag | D15_missing_flag | M1_missing_flag | M2_missing_flag | M3_missing_flag | M4_missing_flag | M5_missing_flag | M6_missing_flag | M7_missing_flag | M8_missing_flag | M9_missing_flag | V1_missing_flag | V2_missing_flag | V3_missing_flag | V4_missing_flag | V5_missing_flag | V6_missing_flag | V7_missing_flag | V8_missing_flag | V9_missing_flag | V10_missing_flag | V11_missing_flag | V12_missing_flag | V13_missing_flag | V14_missing_flag | V15_missing_flag | V16_missing_flag | V17_missing_flag | V18_missing_flag | V19_missing_flag | V20_missing_flag | V21_missing_flag | V22_missing_flag | V23_missing_flag | V24_missing_flag | V25_missing_flag | V26_missing_flag | V27_missing_flag | V28_missing_flag | V29_missing_flag | V30_missing_flag | V31_missing_flag | V32_missing_flag | V33_missing_flag | V34_missing_flag | V35_missing_flag | V36_missing_flag | V37_missing_flag | V38_missing_flag | V39_missing_flag | V40_missing_flag | V41_missing_flag | V42_missing_flag | V43_missing_flag | V44_missing_flag | V45_missing_flag | V46_missing_flag | V47_missing_flag | V48_missing_flag | V49_missing_flag | V50_missing_flag | V51_missing_flag | V52_missing_flag | V53_missing_flag | V54_missing_flag | V55_missing_flag | V56_missing_flag | V57_missing_flag | V58_missing_flag | V59_missing_flag | V60_missing_flag | V61_missing_flag | V62_missing_flag | V63_missing_flag | V64_missing_flag | V65_missing_flag | V66_missing_flag | V67_missing_flag | V68_missing_flag | V69_missing_flag | V70_missing_flag | V71_missing_flag | V72_missing_flag | V73_missing_flag | V74_missing_flag | V75_missing_flag | V76_missing_flag | V77_missing_flag | V78_missing_flag | V79_missing_flag | V80_missing_flag | V81_missing_flag | V82_missing_flag | V83_missing_flag | V84_missing_flag | V85_missing_flag | V86_missing_flag | V87_missing_flag | V88_missing_flag | V89_missing_flag | V90_missing_flag | V91_missing_flag | V92_missing_flag | V93_missing_flag | V94_missing_flag | V95_missing_flag | V96_missing_flag | V97_missing_flag | V98_missing_flag | V99_missing_flag | V100_missing_flag | V101_missing_flag | V102_missing_flag | V103_missing_flag | V104_missing_flag | V105_missing_flag | V106_missing_flag | V107_missing_flag | V108_missing_flag | V109_missing_flag | V110_missing_flag | V111_missing_flag | V112_missing_flag | V113_missing_flag | V114_missing_flag | V115_missing_flag | V116_missing_flag | V117_missing_flag | V118_missing_flag | V119_missing_flag | V120_missing_flag | V121_missing_flag | V122_missing_flag | V123_missing_flag | V124_missing_flag | V125_missing_flag | V126_missing_flag | V127_missing_flag | V128_missing_flag | V129_missing_flag | V130_missing_flag | V131_missing_flag | V132_missing_flag | ... | V170_missing_flag | V171_missing_flag | V172_missing_flag | V173_missing_flag | V174_missing_flag | V175_missing_flag | V176_missing_flag | V177_missing_flag | V178_missing_flag | V179_missing_flag | V180_missing_flag | V181_missing_flag | V182_missing_flag | V183_missing_flag | V184_missing_flag | V185_missing_flag | V186_missing_flag | V187_missing_flag | V188_missing_flag | V189_missing_flag | V190_missing_flag | V191_missing_flag | V192_missing_flag | V193_missing_flag | V194_missing_flag | V195_missing_flag | V196_missing_flag | V197_missing_flag | V198_missing_flag | V199_missing_flag | V200_missing_flag | V201_missing_flag | V202_missing_flag | V203_missing_flag | V204_missing_flag | V205_missing_flag | V206_missing_flag | V207_missing_flag | V208_missing_flag | V209_missing_flag | V210_missing_flag | V211_missing_flag | V212_missing_flag | V213_missing_flag | V214_missing_flag | V215_missing_flag | V216_missing_flag | V217_missing_flag | V218_missing_flag | V219_missing_flag | V220_missing_flag | V221_missing_flag | V222_missing_flag | V223_missing_flag | V224_missing_flag | V225_missing_flag | V226_missing_flag | V227_missing_flag | V228_missing_flag | V229_missing_flag | V230_missing_flag | V231_missing_flag | V232_missing_flag | V233_missing_flag | V234_missing_flag | V235_missing_flag | V236_missing_flag | V237_missing_flag | V238_missing_flag | V239_missing_flag | V240_missing_flag | V241_missing_flag | V242_missing_flag | V243_missing_flag | V244_missing_flag | V245_missing_flag | V246_missing_flag | V247_missing_flag | V248_missing_flag | V249_missing_flag | V250_missing_flag | V251_missing_flag | V252_missing_flag | V253_missing_flag | V254_missing_flag | V255_missing_flag | V256_missing_flag | V257_missing_flag | V258_missing_flag | V259_missing_flag | V260_missing_flag | V261_missing_flag | V262_missing_flag | V263_missing_flag | V264_missing_flag | V265_missing_flag | V266_missing_flag | V267_missing_flag | V268_missing_flag | V269_missing_flag | V270_missing_flag | V271_missing_flag | V272_missing_flag | V273_missing_flag | V274_missing_flag | V275_missing_flag | V276_missing_flag | V277_missing_flag | V278_missing_flag | V279_missing_flag | V280_missing_flag | V281_missing_flag | V282_missing_flag | V283_missing_flag | V284_missing_flag | V285_missing_flag | V286_missing_flag | V287_missing_flag | V288_missing_flag | V289_missing_flag | V290_missing_flag | V291_missing_flag | V292_missing_flag | V293_missing_flag | V294_missing_flag | V295_missing_flag | V296_missing_flag | V297_missing_flag | V298_missing_flag | V299_missing_flag | V300_missing_flag | V301_missing_flag | V302_missing_flag | V303_missing_flag | V304_missing_flag | V305_missing_flag | V306_missing_flag | V307_missing_flag | V308_missing_flag | V309_missing_flag | V310_missing_flag | V311_missing_flag | V312_missing_flag | V313_missing_flag | V314_missing_flag | V315_missing_flag | V316_missing_flag | V317_missing_flag | V318_missing_flag | V319_missing_flag | V320_missing_flag | V321_missing_flag | V322_missing_flag | V323_missing_flag | V324_missing_flag | V325_missing_flag | V326_missing_flag | V327_missing_flag | V328_missing_flag | V329_missing_flag | V330_missing_flag | V331_missing_flag | V332_missing_flag | V333_missing_flag | V334_missing_flag | V335_missing_flag | V336_missing_flag | V337_missing_flag | V338_missing_flag | V339_missing_flag | id_01_missing_flag | id_02_missing_flag | id_03_missing_flag | id_04_missing_flag | id_05_missing_flag | id_06_missing_flag | id_07_missing_flag | id_08_missing_flag | id_09_missing_flag | id_10_missing_flag | id_11_missing_flag | id_12_missing_flag | id_13_missing_flag | id_14_missing_flag | id_15_missing_flag | id_16_missing_flag | id_17_missing_flag | id_18_missing_flag | id_19_missing_flag | id_20_missing_flag | id_21_missing_flag | id_22_missing_flag | id_23_missing_flag | id_24_missing_flag | id_25_missing_flag | id_26_missing_flag | id_27_missing_flag | id_28_missing_flag | id_29_missing_flag | id_30_missing_flag | id_31_missing_flag | id_32_missing_flag | id_33_missing_flag | id_34_missing_flag | id_35_missing_flag | id_36_missing_flag | id_37_missing_flag | id_38_missing_flag | DeviceType_missing_flag | DeviceInfo_missing_flag | Date | _Weekdays | _Hours | _Days | Trans_min_mean | Trans_min_std | TransactionAmt_to_mean_card1 | TransactionAmt_to_mean_card4 | TransactionAmt_to_std_card1 | TransactionAmt_to_std_card4 | PCA_V_0 | PCA_V_1 | PCA_V_2 | PCA_V_3 | PCA_V_4 | PCA_V_5 | PCA_V_6 | PCA_V_7 | PCA_V_8 | PCA_V_9 | PCA_V_10 | PCA_V_11 | PCA_V_12 | PCA_V_13 | PCA_V_14 | PCA_V_15 | PCA_V_16 | PCA_V_17 | PCA_V_18 | PCA_V_19 | PCA_V_20 | PCA_V_21 | PCA_V_22 | PCA_V_23 | PCA_V_24 | PCA_V_25 | PCA_V_26 | PCA_V_27 | PCA_V_28 | PCA_V_29 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2987000 | 0 | 86400 | 4.226562 | W | 13926 | NaN | 150.0 | discover | 142.0 | credit | 315.0 | 87.0 | 19.0 | NoInf | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | 1.0 | 1.0 | 14.0 | NaN | 13.0 | NaN | NaN | NaN | NaN | NaN | 13.0 | 13.0 | NaN | NaN | NaN | 0.0 | T | T | T | M2 | F | T | NaN | NaN | NaN | 0.0 | 70787.0 | NaN | NaN | NaN | NaN | NaN | NaN | 100.0 | NotFound | NaN | -480.0 | New | NotFound | 166.0 | 542.0 | 144.0 | New | NotFound | Android | Samsung | 32.0 | 0.000921 | match_status:2 | T | F | T | T | mobile | 0.000015 | True | False | False | False | False | False | False | False | True | True | True | False | True | False | True | True | True | True | True | True | False | False | True | True | True | False | False | False | False | False | False | False | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | True | True | True | True | True | True | False | False | True | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 0.000002 | 5 | 0 | 2 | -66.5 | -0.278076 | 0.194580 | 0.257812 | 0.184560 | 0.170288 | -0.157349 | 0.919434 | -0.843750 | 0.308105 | -0.089417 | 0.003044 | -0.020050 | -0.187622 | 0.038208 | 0.002604 | -0.010536 | 0.034058 | -0.044434 | -0.089722 | 0.044769 | 0.001550 | -0.003441 | 0.018616 | -0.018387 | 0.010078 | -0.026947 | -0.021362 | -0.054626 | 0.025375 | 0.018814 | -0.006039 | 0.004055 | -0.043335 | 0.008026 | -0.007957 |
| 1 | 2987001 | 0 | 86401 | 3.367188 | W | 2755 | 404.0 | 150.0 | mastercard | 102.0 | credit | 325.0 | 87.0 | NaN | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | M0 | T | T | NaN | NaN | NaN | -5.0 | 98945.0 | NaN | NaN | 0.0 | -5.0 | NaN | NaN | 100.0 | NotFound | 49.0 | -300.0 | New | NotFound | 166.0 | 621.0 | 500.0 | New | NotFound | iOS | Safari | 32.0 | 0.010917 | match_status:1 | T | F | F | T | mobile | 0.033498 | False | False | False | False | False | False | False | True | True | False | True | False | True | True | False | True | True | True | True | True | False | True | True | True | True | False | True | True | True | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | False | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 0.000002 | 5 | 0 | 2 | -106.0 | -0.443359 | 0.123779 | 0.218994 | 0.063004 | 0.114197 | -0.086365 | -0.800293 | -0.152344 | -0.363525 | -0.101868 | -0.002291 | 0.032318 | -0.068848 | 0.040222 | -0.180176 | -0.059387 | 0.002302 | 0.018982 | -0.029556 | 0.016647 | -0.006241 | -0.004208 | 0.010170 | -0.001647 | -0.022919 | 0.006298 | -0.021164 | 0.054626 | -0.042542 | -0.026794 | 0.003531 | 0.001647 | 0.001576 | -0.003611 | -0.003761 | |
| 2 | 2987002 | 0 | 86469 | 4.078125 | W | 4663 | 490.0 | 150.0 | visa | 166.0 | debit | 330.0 | 87.0 | 287.0 | Microsoft | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | 315.0 | NaN | NaN | NaN | 315.0 | T | T | T | M0 | F | F | F | F | F | -5.0 | 191631.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 100.0 | NotFound | 52.0 | NaN | Found | Found | 121.0 | 410.0 | 142.0 | Found | Found | NAN | Chrome | NaN | NaN | NaN | F | F | T | T | desktop | 0.080811 | False | False | False | False | False | False | False | False | True | False | True | False | True | True | False | True | True | True | True | True | False | False | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | True | True | False | False | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | False | 0.000002 | 5 | 0 | 2 | -76.0 | -0.317871 | 0.607910 | 0.443115 | 0.589226 | 0.258545 | -0.800781 | 0.316895 | 0.273193 | -0.026352 | 0.043182 | -0.008064 | -0.039276 | -0.217041 | 0.017715 | 0.033508 | -0.000322 | -0.015343 | 0.020676 | -0.046051 | -0.006725 | 0.004875 | 0.001104 | 0.007484 | -0.007793 | -0.006611 | -0.008270 | 0.009857 | -0.007710 | 0.003191 | 0.002834 | 0.001886 | 0.003839 | 0.002903 | -0.019592 | -0.003424 |
| 3 | 2987003 | 0 | 86499 | 3.912109 | W | 18132 | 567.0 | 150.0 | mastercard | 117.0 | debit | 476.0 | 87.0 | NaN | Yahoo Mail | NoInf | 2.0 | 5.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 25.0 | 1.0 | 112.0 | 112.0 | 0.0 | 94.0 | 0.0 | NaN | NaN | NaN | 84.0 | NaN | NaN | NaN | NaN | 111.0 | NaN | NaN | NaN | M0 | T | F | NaN | NaN | NaN | -5.0 | 221832.0 | NaN | NaN | 0.0 | -6.0 | NaN | NaN | 100.0 | NotFound | 52.0 | NaN | New | NotFound | 225.0 | 176.0 | 507.0 | New | NotFound | NAN | Chrome | NaN | NaN | NaN | F | F | T | T | desktop | NaN | False | False | False | False | False | False | False | True | True | False | True | False | False | False | False | False | True | True | True | True | False | True | True | True | True | False | True | True | True | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | True | True | False | False | True | True | True | True | False | False | False | True | False | False | False | True | False | False | True | True | True | True | True | True | True | False | False | True | False | True | True | True | False | False | False | False | False | True | 0.000002 | 5 | 0 | 2 | -85.0 | -0.355469 | 0.405029 | 0.377686 | 0.259460 | 0.196899 | -0.237427 | -0.811523 | -0.123657 | -0.423828 | -0.067261 | 0.025040 | 0.110413 | -0.253906 | 0.004803 | 0.170410 | -0.012550 | -0.014488 | 0.005268 | 0.031891 | -0.013489 | -0.017319 | -0.001888 | -0.084717 | 0.050293 | 0.140747 | 0.058960 | -0.020218 | 0.066589 | -0.010910 | -0.017868 | 0.025528 | 0.003674 | 0.003511 | 0.026047 | -0.041962 |
| 4 | 2987004 | 0 | 86506 | 3.912109 | H | 4497 | 514.0 | 150.0 | mastercard | 102.0 | credit | 420.0 | 87.0 | NaN | NoInf | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 7460.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 100.0 | NotFound | NaN | -300.0 | Found | Found | 166.0 | 529.0 | 575.0 | Found | Found | Mac | Chrome | 24.0 | 0.003639 | match_status:2 | T | F | T | T | desktop | 0.021291 | False | False | False | False | False | False | False | True | True | False | True | False | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | True | False | False | False | False | True | False | False | False | False | False | False | False | True | True | True | True | True | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | 0.000002 | 5 | 0 | 2 | -85.0 | -0.355469 | 0.515625 | 0.377686 | 0.882898 | 0.196899 | 2.904297 | 0.380127 | 0.480713 | -0.009659 | -0.171753 | 1.170898 | -0.178223 | 0.004517 | 0.043793 | -0.001614 | 0.015610 | -0.017883 | -0.020523 | -0.005604 | -0.010262 | -0.003822 | -0.011459 | -0.008179 | 0.013390 | 0.017792 | -0.013702 | 0.000093 | -0.005245 | -0.142334 | 0.202271 | 0.014458 | 0.012764 | 0.002150 | 0.014008 | -0.001770 |
5 rows × 537 columns
It is a popular encoding technique for handling categorical variables. In this technique, each label is assigned a unique integer based on alphabetical ordering.
# Label encode the variables
for col in cat_columns:
lbl = LabelEncoder()
lbl.fit(list(df[col].values))
df[col] = lbl.transform(list(df[col].values))
Let's reduce the memory usage as lot of new columns has been added to the data frame
# Reduce memory usage
df = reduce_mem_usage(df)
Mem. usage decreased to 361.00 Mb (82.7% reduction)
Tip : Save the train df, and clean all memory
# Save train df to csv file
df.to_csv("Intermediate_Datasets/df_intermediate2.csv", index = False)
The goal of this section is to:
# Read train df
df = pd.read_csv("Intermediate_Datasets/df_intermediate2.csv")
# df = df.sample(10000, random_state=0)
df.loc[:, 'isFraud'].value_counts()
0 569877 1 20663 Name: isFraud, dtype: int64
Drop the columns which may not be useful for model building
df = df.drop(['TransactionID','TransactionDT','Date'], axis=1)
Separate the x variables and y variables
# Split the y variable series and x variables dataset
X = df.drop(['isFraud'],axis=1)
y = df.isFraud.astype(bool)
# Delete train df
del df
# Collect garbage
gc.collect()
0
Split the dataset into train set and test set. Train set will be used to train the model. Test set will be used to check the performance of model
# Split the dataset into the training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
# Head of X_train
X_train.head()
| TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | id_01 | id_02 | id_03 | id_04 | id_05 | id_06 | id_09 | id_10 | id_11 | id_12 | id_13 | id_14 | id_15 | id_16 | id_17 | id_19 | id_20 | id_28 | id_29 | id_30 | id_31 | id_32 | id_33 | id_34 | id_35 | id_36 | id_37 | id_38 | DeviceType | DeviceInfo | card2_missing_flag | card3_missing_flag | card4_missing_flag | card5_missing_flag | card6_missing_flag | addr1_missing_flag | addr2_missing_flag | dist1_missing_flag | dist2_missing_flag | P_emaildomain_missing_flag | R_emaildomain_missing_flag | D1_missing_flag | D2_missing_flag | D3_missing_flag | D4_missing_flag | D5_missing_flag | D6_missing_flag | D7_missing_flag | D8_missing_flag | D9_missing_flag | D10_missing_flag | D11_missing_flag | D12_missing_flag | D13_missing_flag | D14_missing_flag | D15_missing_flag | M1_missing_flag | M2_missing_flag | M3_missing_flag | M4_missing_flag | M5_missing_flag | M6_missing_flag | M7_missing_flag | M8_missing_flag | M9_missing_flag | V1_missing_flag | V2_missing_flag | V3_missing_flag | V4_missing_flag | V5_missing_flag | V6_missing_flag | V7_missing_flag | V8_missing_flag | V9_missing_flag | V10_missing_flag | V11_missing_flag | V12_missing_flag | V13_missing_flag | V14_missing_flag | V15_missing_flag | V16_missing_flag | V17_missing_flag | V18_missing_flag | V19_missing_flag | V20_missing_flag | V21_missing_flag | V22_missing_flag | V23_missing_flag | V24_missing_flag | V25_missing_flag | V26_missing_flag | V27_missing_flag | V28_missing_flag | V29_missing_flag | V30_missing_flag | V31_missing_flag | V32_missing_flag | V33_missing_flag | V34_missing_flag | V35_missing_flag | V36_missing_flag | V37_missing_flag | V38_missing_flag | V39_missing_flag | V40_missing_flag | V41_missing_flag | V42_missing_flag | V43_missing_flag | V44_missing_flag | V45_missing_flag | V46_missing_flag | V47_missing_flag | V48_missing_flag | V49_missing_flag | V50_missing_flag | V51_missing_flag | V52_missing_flag | V53_missing_flag | V54_missing_flag | V55_missing_flag | V56_missing_flag | V57_missing_flag | V58_missing_flag | V59_missing_flag | V60_missing_flag | V61_missing_flag | V62_missing_flag | V63_missing_flag | V64_missing_flag | V65_missing_flag | V66_missing_flag | V67_missing_flag | V68_missing_flag | V69_missing_flag | V70_missing_flag | V71_missing_flag | V72_missing_flag | V73_missing_flag | V74_missing_flag | V75_missing_flag | V76_missing_flag | V77_missing_flag | V78_missing_flag | V79_missing_flag | V80_missing_flag | V81_missing_flag | V82_missing_flag | V83_missing_flag | V84_missing_flag | V85_missing_flag | V86_missing_flag | V87_missing_flag | V88_missing_flag | V89_missing_flag | V90_missing_flag | V91_missing_flag | V92_missing_flag | V93_missing_flag | V94_missing_flag | V95_missing_flag | V96_missing_flag | V97_missing_flag | V98_missing_flag | V99_missing_flag | V100_missing_flag | V101_missing_flag | V102_missing_flag | V103_missing_flag | V104_missing_flag | V105_missing_flag | V106_missing_flag | V107_missing_flag | V108_missing_flag | V109_missing_flag | V110_missing_flag | V111_missing_flag | V112_missing_flag | V113_missing_flag | V114_missing_flag | V115_missing_flag | V116_missing_flag | V117_missing_flag | V118_missing_flag | V119_missing_flag | V120_missing_flag | V121_missing_flag | V122_missing_flag | V123_missing_flag | V124_missing_flag | V125_missing_flag | V126_missing_flag | V127_missing_flag | V128_missing_flag | V129_missing_flag | V130_missing_flag | V131_missing_flag | V132_missing_flag | V133_missing_flag | V134_missing_flag | V135_missing_flag | ... | V169_missing_flag | V170_missing_flag | V171_missing_flag | V172_missing_flag | V173_missing_flag | V174_missing_flag | V175_missing_flag | V176_missing_flag | V177_missing_flag | V178_missing_flag | V179_missing_flag | V180_missing_flag | V181_missing_flag | V182_missing_flag | V183_missing_flag | V184_missing_flag | V185_missing_flag | V186_missing_flag | V187_missing_flag | V188_missing_flag | V189_missing_flag | V190_missing_flag | V191_missing_flag | V192_missing_flag | V193_missing_flag | V194_missing_flag | V195_missing_flag | V196_missing_flag | V197_missing_flag | V198_missing_flag | V199_missing_flag | V200_missing_flag | V201_missing_flag | V202_missing_flag | V203_missing_flag | V204_missing_flag | V205_missing_flag | V206_missing_flag | V207_missing_flag | V208_missing_flag | V209_missing_flag | V210_missing_flag | V211_missing_flag | V212_missing_flag | V213_missing_flag | V214_missing_flag | V215_missing_flag | V216_missing_flag | V217_missing_flag | V218_missing_flag | V219_missing_flag | V220_missing_flag | V221_missing_flag | V222_missing_flag | V223_missing_flag | V224_missing_flag | V225_missing_flag | V226_missing_flag | V227_missing_flag | V228_missing_flag | V229_missing_flag | V230_missing_flag | V231_missing_flag | V232_missing_flag | V233_missing_flag | V234_missing_flag | V235_missing_flag | V236_missing_flag | V237_missing_flag | V238_missing_flag | V239_missing_flag | V240_missing_flag | V241_missing_flag | V242_missing_flag | V243_missing_flag | V244_missing_flag | V245_missing_flag | V246_missing_flag | V247_missing_flag | V248_missing_flag | V249_missing_flag | V250_missing_flag | V251_missing_flag | V252_missing_flag | V253_missing_flag | V254_missing_flag | V255_missing_flag | V256_missing_flag | V257_missing_flag | V258_missing_flag | V259_missing_flag | V260_missing_flag | V261_missing_flag | V262_missing_flag | V263_missing_flag | V264_missing_flag | V265_missing_flag | V266_missing_flag | V267_missing_flag | V268_missing_flag | V269_missing_flag | V270_missing_flag | V271_missing_flag | V272_missing_flag | V273_missing_flag | V274_missing_flag | V275_missing_flag | V276_missing_flag | V277_missing_flag | V278_missing_flag | V279_missing_flag | V280_missing_flag | V281_missing_flag | V282_missing_flag | V283_missing_flag | V284_missing_flag | V285_missing_flag | V286_missing_flag | V287_missing_flag | V288_missing_flag | V289_missing_flag | V290_missing_flag | V291_missing_flag | V292_missing_flag | V293_missing_flag | V294_missing_flag | V295_missing_flag | V296_missing_flag | V297_missing_flag | V298_missing_flag | V299_missing_flag | V300_missing_flag | V301_missing_flag | V302_missing_flag | V303_missing_flag | V304_missing_flag | V305_missing_flag | V306_missing_flag | V307_missing_flag | V308_missing_flag | V309_missing_flag | V310_missing_flag | V311_missing_flag | V312_missing_flag | V313_missing_flag | V314_missing_flag | V315_missing_flag | V316_missing_flag | V317_missing_flag | V318_missing_flag | V319_missing_flag | V320_missing_flag | V321_missing_flag | V322_missing_flag | V323_missing_flag | V324_missing_flag | V325_missing_flag | V326_missing_flag | V327_missing_flag | V328_missing_flag | V329_missing_flag | V330_missing_flag | V331_missing_flag | V332_missing_flag | V333_missing_flag | V334_missing_flag | V335_missing_flag | V336_missing_flag | V337_missing_flag | V338_missing_flag | V339_missing_flag | id_01_missing_flag | id_02_missing_flag | id_03_missing_flag | id_04_missing_flag | id_05_missing_flag | id_06_missing_flag | id_07_missing_flag | id_08_missing_flag | id_09_missing_flag | id_10_missing_flag | id_11_missing_flag | id_12_missing_flag | id_13_missing_flag | id_14_missing_flag | id_15_missing_flag | id_16_missing_flag | id_17_missing_flag | id_18_missing_flag | id_19_missing_flag | id_20_missing_flag | id_21_missing_flag | id_22_missing_flag | id_23_missing_flag | id_24_missing_flag | id_25_missing_flag | id_26_missing_flag | id_27_missing_flag | id_28_missing_flag | id_29_missing_flag | id_30_missing_flag | id_31_missing_flag | id_32_missing_flag | id_33_missing_flag | id_34_missing_flag | id_35_missing_flag | id_36_missing_flag | id_37_missing_flag | id_38_missing_flag | DeviceType_missing_flag | DeviceInfo_missing_flag | _Weekdays | _Hours | _Days | Trans_min_mean | Trans_min_std | TransactionAmt_to_mean_card1 | TransactionAmt_to_mean_card4 | TransactionAmt_to_std_card1 | TransactionAmt_to_std_card4 | PCA_V_0 | PCA_V_1 | PCA_V_2 | PCA_V_3 | PCA_V_4 | PCA_V_5 | PCA_V_6 | PCA_V_7 | PCA_V_8 | PCA_V_9 | PCA_V_10 | PCA_V_11 | PCA_V_12 | PCA_V_13 | PCA_V_14 | PCA_V_15 | PCA_V_16 | PCA_V_17 | PCA_V_18 | PCA_V_19 | PCA_V_20 | PCA_V_21 | PCA_V_22 | PCA_V_23 | PCA_V_24 | PCA_V_25 | PCA_V_26 | PCA_V_27 | PCA_V_28 | PCA_V_29 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 464452 | 3.432 | 4 | 15063 | 514.0 | 150.0 | 4 | 226.0 | 1 | 315.0 | 87.0 | 18.0 | 0 | 2 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 2.0 | 1.0 | 35.0 | 35.0 | 34.0 | 34.0 | 34.0 | NaN | NaN | NaN | 34.0 | 34.0 | NaN | NaN | NaN | 34.0 | 1 | 1 | 1 | 3 | 2 | 1 | 0 | 0 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | NaN | NaN | 3 | 2 | NaN | NaN | NaN | 2 | 2 | 3 | 4 | NaN | NaN | 4 | 2 | 2 | 2 | 2 | 2 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 4 | 18 | -104.06 | -0.4350 | 0.1501 | 0.2324 | 0.092611 | 0.13560 | -0.9330 | 0.3564 | 0.2969 | -0.05145 | 0.012596 | 0.01012 | 0.02788 | 0.1576 | -0.053300 | 0.13940 | 0.033940 | 0.021680 | -0.04034 | 0.02574 | -0.022110 | -0.003386 | 0.019290 | 0.013780 | -0.007637 | -0.031980 | 0.012856 | -0.007412 | 0.000978 | -0.016700 | -0.007600 | -0.027450 | 0.00541 | -0.006160 | 0.005920 | 0.008170 |
| 36372 | 3.717 | 0 | 6019 | 583.0 | 150.0 | 4 | 226.0 | 1 | NaN | NaN | NaN | 0 | 0 | 928.0 | 1301.0 | 0.0 | 475.0 | 0.0 | 475.0 | 476.0 | 688.0 | 0.0 | 823.0 | 667.0 | 667.0 | 695.0 | 300.0 | 0.0 | NaN | NaN | 0.0 | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 302.0 | 0.0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | -5.0 | 49333.0 | 0.0 | 0.0 | 0.0 | -1.0 | 0.0 | 0.0 | 100.0 | 1 | 52.0 | -360.0 | 0 | 0 | 166.0 | 410.0 | 611.0 | 0 | 0 | 4 | 0 | 24.0 | 0.02858 | 3 | 1 | 0 | 1 | 0 | 0 | 0.0808 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 11 | -93.94 | -0.3928 | 0.1823 | 0.3086 | 0.103778 | 0.18000 | 0.6445 | -0.9595 | 0.6094 | 0.50200 | -0.377700 | -0.15760 | 0.07880 | 0.1742 | 1.205000 | 0.12660 | -0.007683 | -0.130900 | -0.04294 | 0.00345 | 0.057530 | 0.048770 | -0.015470 | -0.016300 | 0.073850 | 0.016130 | -0.018260 | 0.036070 | -0.019200 | -0.050660 | -0.045530 | -0.032350 | -0.02182 | 0.038240 | -0.009430 | 0.004864 |
| 572387 | 5.527 | 4 | 17188 | 321.0 | 150.0 | 4 | 226.0 | 2 | 299.0 | 87.0 | 1.0 | 6 | 2 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 2.0 | 0.0 | 1.0 | 0.0 | 14.0 | 1.0 | 568.0 | 80.0 | 1.0 | 569.0 | 1.0 | NaN | NaN | NaN | 569.0 | 569.0 | NaN | NaN | NaN | 569.0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | NaN | NaN | 3 | 2 | NaN | NaN | NaN | 2 | 2 | 3 | 4 | NaN | NaN | 4 | 2 | 2 | 2 | 2 | 2 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 4 | 21 | 25 | 116.00 | 0.4849 | 1.9370 | 1.8850 | 1.348902 | 1.10000 | -0.8013 | 0.3174 | 0.2720 | -0.02687 | 0.049070 | -0.00634 | -0.02972 | -0.2195 | 0.019070 | 0.03270 | -0.009254 | -0.009834 | 0.02269 | -0.02753 | -0.009140 | -0.002127 | -0.001934 | -0.025130 | 0.016200 | 0.040620 | 0.000128 | 0.001646 | -0.004090 | 0.010020 | 0.002650 | 0.010440 | 0.00992 | 0.005665 | -0.008720 | -0.011665 |
| 497276 | 3.950 | 4 | 1214 | 174.0 | 150.0 | 4 | 226.0 | 1 | 204.0 | 87.0 | NaN | 4 | 2 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | 0.0 | NaN | NaN | NaN | 0.0 | 1 | 1 | 1 | 3 | 2 | 0 | 0 | 0 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | NaN | NaN | 3 | 2 | NaN | NaN | NaN | 2 | 2 | 3 | 4 | NaN | NaN | 4 | 2 | 2 | 2 | 2 | 2 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 19 | 30 | -83.06 | -0.3474 | 0.2540 | 0.3901 | 0.147945 | 0.22770 | -0.9326 | 0.3564 | 0.2974 | -0.05154 | 0.010610 | 0.00979 | 0.02590 | 0.1611 | -0.054500 | 0.13270 | 0.035740 | -0.010574 | -0.00642 | -0.03049 | -0.012240 | 0.017960 | 0.018100 | 0.002857 | -0.016740 | 0.007100 | 0.018840 | -0.006878 | 0.008800 | -0.004406 | 0.000245 | -0.014570 | -0.00884 | -0.011110 | -0.000597 | -0.007458 |
| 97470 | 3.219 | 1 | 12695 | 490.0 | 150.0 | 4 | 226.0 | 2 | 325.0 | 87.0 | NaN | 6 | 6 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | -5.0 | 68281.0 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | 100.0 | 1 | 52.0 | NaN | 1 | 1 | 225.0 | 266.0 | 507.0 | 1 | 1 | 3 | 0 | NaN | NaN | 4 | 0 | 0 | 1 | 0 | 0 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 5 | 21 | 23 | -110.00 | -0.4600 | 0.1771 | 0.1877 | 0.116133 | 0.10956 | 2.8070 | 0.3457 | 0.3772 | -0.01767 | -0.042420 | -0.24430 | 0.02310 | 0.0118 | 0.000914 | 0.02304 | -0.009550 | -0.022300 | -0.02167 | -0.01366 | -0.005627 | -0.009690 | 0.001454 | -0.008240 | -0.008900 | 0.005936 | -0.004475 | -0.000716 | -0.003060 | 0.008514 | -0.017840 | 0.000079 | 0.01335 | -0.005287 | -0.000648 | 0.007446 |
5 rows × 533 columns
Finally, model building starts here.
The goal of this section is to:
XGBoost is an optimized distributed gradient boosting model designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way.
%%time
# Define the model
xgb = XGBClassifier(nthread = -1, random_state=0)
# Train the model
xgb.fit(X_train, y_train)
xgb
Wall time: 10min 13s
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
importance_type='gain', interaction_constraints='',
learning_rate=0.300000012, max_delta_step=0, max_depth=6,
min_child_weight=1, missing=nan, monotone_constraints='()',
n_estimators=100, n_jobs=-1, nthread=-1, num_parallel_tree=1,
random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
subsample=1, tree_method='exact', validate_parameters=1,
verbosity=None)
Let's use the model to get predictions on test dataset. We would be looking at the predicted class and predicted probability both in order to evaluate the performance of the model
# Prediction
y_pred_xgb = xgb.predict(X_test)
y_prob_pred_xgb = xgb.predict_proba(X_test)
y_prob_pred_xgb = [x[1] for x in y_prob_pred_xgb]
print("Y predicted : ",y_pred_xgb)
print("Y probability predicted : ",y_prob_pred_xgb[:5])
Y predicted : [False False False ... False False False] Y probability predicted : [0.00088362244, 0.0128200315, 0.003954415, 0.008103509, 0.0021388768]
Concordance
from bisect import bisect_left, bisect_right
def concordance(actuals, preds):
ones_preds = [p for a,p in zip(actuals, preds) if a == 1]
zeros_preds = [p for a,p in zip(actuals, preds) if a == 0]
n_ones = len([x for x in actuals if x == 1])
n_total_pairs = float(n_ones) * float(len(actuals) - n_ones)
# print("Total Pairs: ", n_total_pairs)
zeros_sorted = sorted(zeros_preds)
conc = 0; disc = 0; ties = 0;
for i, one_pred in enumerate(ones_preds):
cur_conc = bisect_left(zeros_sorted, one_pred)
cur_ties = bisect_right(zeros_sorted, one_pred) - cur_conc
conc += cur_conc
ties += cur_ties
disc += float(len(zeros_sorted)) - cur_ties - cur_conc
concordance = conc/n_total_pairs
discordance = disc/n_total_pairs
ties_perc = ties/n_total_pairs
return concordance
All evaluation metrics
def compute_evaluation_metric(model, x_test, y_actual, y_predicted, y_predicted_prob):
print("\n Accuracy Score : ",accuracy_score(y_actual,y_predicted))
print("\n AUC Score : ", roc_auc_score(y_actual, y_predicted_prob))
print("\n Confusion Matrix : \n",confusion_matrix(y_actual, y_predicted))
print("\n Classification Report : \n",classification_report(y_actual, y_predicted))
print("\n Concordance Index : ", concordance(y_actual, y_predicted_prob))
print("\n ROC curve : \n")
plot_roc_curve(model, x_test, y_actual)
plt.show()
print("\n PR curve : \n")
plot_precision_recall_curve(model, x_test, y_actual)
plt.show()
concordance(y_test.values, y_prob_pred_xgb)
0.9348031547662842
# Compute Evaluation Metric
compute_evaluation_metric(xgb, X_test, y_test, y_pred_xgb, y_prob_pred_xgb)
Accuracy Score : 0.9797247716778993
AUC Score : 0.9348031681385632
Confusion Matrix :
[[170693 348]
[ 3244 2877]]
Classification Report :
precision recall f1-score support
False 0.98 1.00 0.99 171041
True 0.89 0.47 0.62 6121
accuracy 0.98 177162
macro avg 0.94 0.73 0.80 177162
weighted avg 0.98 0.98 0.98 177162
Concordance Index : 0.9348031547662842
ROC curve :
PR curve :
Divide the data in 10 equal bins as per predicted probability scores. Then, compute the percentage of the total target class 1 captured in every bin.
Ideally the proportion should be decreasing as we go down ever bin. Let's check it out
# Create Validation set
validation_df = {'y_test' : y_test, 'y_pred' : y_pred_xgb, 'y_pred_prob' : y_prob_pred_xgb}
validation_df = pd.DataFrame(data = validation_df)
# Add binning column to the dataframe
validation_df['bin_y_pred_prob'] = pd.qcut(validation_df['y_pred_prob'], q=10)
validation_df.head()
| y_test | y_pred | y_pred_prob | bin_y_pred_prob | |
|---|---|---|---|---|
| 7681 | False | False | 0.000884 | (-0.0009859, 0.00121] |
| 570242 | False | False | 0.012820 | (0.00914, 0.0132] |
| 340470 | False | False | 0.003954 | (0.00333, 0.00477] |
| 131781 | False | False | 0.008104 | (0.00659, 0.00914] |
| 472772 | False | False | 0.002139 | (0.00121, 0.00219] |
# Change x label
x_label = []
for i in range(len(validation_df['bin_y_pred_prob'].cat.categories[::-1].astype('str'))):
x_label.append("Bin" + str(i + 1)+ "(" + validation_df['bin_y_pred_prob'].cat.categories[::-1].astype('str')[i] + ")")
# Plot Distribution of predicted probabilities for every bin
plt.figure(figsize=(12, 8));
sns.stripplot(validation_df.bin_y_pred_prob, validation_df.y_pred_prob, jitter = 0.15, hue = validation_df.y_test, order = validation_df['bin_y_pred_prob'].cat.categories[::-1])
plt.title("Distribution of predicted probabilities for every bin", fontsize=18)
plt.xlabel("Predicted Probability Bins", fontsize=14);
plt.ylabel("Predicted Probability", fontsize=14);
plt.xticks(np.arange(10), x_label, rotation=45);
plt.show()
# Aggregate the data
gains_df = validation_df.groupby(["bin_y_pred_prob","y_test"]).agg({'y_test': ['count']})
gains_df.columns = gains_df.columns.map(''.join)
gains_df['prob_bin'] = gains_df.index.get_level_values(0)
gains_df['y_test'] = gains_df.index.get_level_values(1)
gains_df.reset_index(drop = True, inplace = True)
gains_df
# Get infection rate and percentage infections
gains_table = gains_df.pivot(index='prob_bin', columns='y_test', values='y_testcount')
gains_table['prob_bin'] = gains_table.index
gains_table = gains_table.iloc[::-1]
gains_table['prob_bin'] = x_label
gains_table.reset_index(drop = True, inplace = True)
gains_table = gains_table[['prob_bin', 0, 1]]
gains_table.columns = ['prob_bin', "not_fraud", "fraud"]
gains_table['perc_fraud'] = gains_table['fraud']/gains_table['fraud'].sum()
gains_table['perc_not_fraud'] = gains_table['not_fraud']/gains_table['not_fraud'].sum()
gains_table['cum_perc_fraud'] = 100*(gains_table.fraud.cumsum() / gains_table.fraud.sum())
gains_table['cum_perc_not_fraud'] = 100*(gains_table.not_fraud.cumsum() / gains_table.not_fraud.sum())
gains_table
# Plot
plt.figure(figsize=(12, 8));
sns.set_style("white")
sns.pointplot(x = "prob_bin", y = "cum_perc_fraud", data = gains_table, legend = False, order=gains_table.prob_bin)
plt.xticks(rotation=45);
plt.ylabel("Fraud Rate", fontsize=14)
plt.xlabel("Prediction probability bin", fontsize=14)
plt.title("Fraud rate for every bin", fontsize=18)
plt.show()
Ideally the slope should be high initially and should decrease as we move further to the right. This is not really a good model.
# One big function.
def captures(y_test, y_pred, y_pred_prob):
# Create Validation set
validation_df = {'y_test' : y_test, 'y_pred' : y_pred, 'y_pred_prob' : y_pred_prob}
validation_df = pd.DataFrame(data = validation_df)
# Add binning column to the dataframe
try:
validation_df['bin_y_pred_prob'] = pd.qcut(validation_df['y_pred_prob'], q=10)
except:
validation_df['bin_y_pred_prob'] = pd.qcut(validation_df['y_pred_prob'], q=10, duplicates='drop')
# Change x label and column names
x_label = []
for i in range(len(validation_df['bin_y_pred_prob'].cat.categories[::-1].astype('str'))):
x_label.append("Bin" + str(i + 1)+ "(" + validation_df['bin_y_pred_prob'].cat.categories[::-1].astype('str')[i] + ")")
# Plot Distribution of predicted probabilities for every bin
plt.figure(figsize=(12, 8));
sns.stripplot(validation_df.bin_y_pred_prob, validation_df.y_pred_prob, jitter = 0.15, hue = validation_df.y_test, order = validation_df['bin_y_pred_prob'].cat.categories[::-1])
plt.title("Distribution of predicted probabilities for every bin", fontsize=18)
plt.xlabel("Predicted Probability Bins", fontsize=14);
plt.ylabel("Predicted Probability", fontsize=14);
try:
plt.xticks(np.arange(10), x_label, rotation=45);
except:
pass
plt.show()
# Aggregate the data
gains_df = validation_df.groupby(["bin_y_pred_prob","y_test"]).agg({'y_test': ['count']})
gains_df.columns = gains_df.columns.map(''.join)
gains_df['prob_bin'] = gains_df.index.get_level_values(0)
gains_df['y_test'] = gains_df.index.get_level_values(1)
gains_df.reset_index(drop = True, inplace = True)
gains_df
# Get infection rate and percentage infections
gains_table = gains_df.pivot(index='prob_bin', columns='y_test', values='y_testcount')
gains_table['prob_bin'] = gains_table.index
gains_table = gains_table.iloc[::-1]
gains_table['prob_bin'] = x_label
gains_table.reset_index(drop = True, inplace = True)
gains_table = gains_table[['prob_bin', 0, 1]]
gains_table.columns = ['prob_bin', "not_fraud", "fraud"]
gains_table['perc_fraud'] = gains_table['fraud']/gains_table['fraud'].sum()
gains_table['perc_not_fraud'] = gains_table['not_fraud']/gains_table['not_fraud'].sum()
gains_table['cum_perc_fraud'] = 100*(gains_table.fraud.cumsum() / gains_table.fraud.sum())
gains_table['cum_perc_not_fraud'] = 100*(gains_table.not_fraud.cumsum() / gains_table.not_fraud.sum())
gains_table
# Plot
plt.figure(figsize=(12, 8));
sns.set_style("white")
sns.pointplot(x = "prob_bin", y = "cum_perc_fraud", data = gains_table, legend = False, order=gains_table.prob_bin)
plt.xticks(rotation=45);
plt.ylabel("Fraud Rate", fontsize=14)
plt.xlabel("Prediction probability bin", fontsize=14)
plt.title("Fraud rate for every bin", fontsize=18)
plt.show()
return gains_table
# Gains Table and Capture rates plot
captures(y_test, y_pred_xgb, y_prob_pred_xgb)
| prob_bin | not_fraud | fraud | perc_fraud | perc_not_fraud | cum_perc_fraud | cum_perc_not_fraud | |
|---|---|---|---|---|---|---|---|
| 0 | Bin1((0.0472, 1.0]) | 12835 | 4882 | 0.797582 | 0.075040 | 79.758209 | 7.504049 |
| 1 | Bin2((0.0215, 0.0472]) | 17204 | 512 | 0.083646 | 0.100584 | 88.122856 | 17.562456 |
| 2 | Bin3((0.0132, 0.0215]) | 17460 | 256 | 0.041823 | 0.102081 | 92.305179 | 27.770535 |
| 3 | Bin4((0.00914, 0.0132]) | 17572 | 144 | 0.023526 | 0.102736 | 94.657736 | 38.044095 |
| 4 | Bin5((0.00659, 0.00914]) | 17606 | 109 | 0.017808 | 0.102934 | 96.438490 | 48.337533 |
| 5 | Bin6((0.00477, 0.00659]) | 17649 | 68 | 0.011109 | 0.103186 | 97.549420 | 58.656112 |
| 6 | Bin7((0.00333, 0.00477]) | 17662 | 54 | 0.008822 | 0.103262 | 98.431629 | 68.982291 |
| 7 | Bin8((0.00219, 0.00333]) | 17668 | 48 | 0.007842 | 0.103297 | 99.215814 | 79.311978 |
| 8 | Bin9((0.00121, 0.00219]) | 17689 | 27 | 0.004411 | 0.103420 | 99.656919 | 89.653943 |
| 9 | Bin10((-0.0009859, 0.00121]) | 17696 | 21 | 0.003431 | 0.103461 | 100.000000 | 100.000000 |
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt
def draw_calibration_curve(y_test, y_prob, n_bins=10):
plt.figure(figsize=(7, 7), dpi=120)
ax1 = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
ax2 = plt.subplot2grid((3, 1), (2, 0))
ax1.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated")
fraction_of_positives, mean_predicted_value = calibration_curve(y_test, y_prob, n_bins=10)
ax1.plot(mean_predicted_value, fraction_of_positives, "s-", label="%s" % ("Model", ))
ax2.hist(y_prob, range=(0, 1), bins=10, label="Model", histtype="step", lw=2)
# Labels
ax1.set_ylabel("Fraction of positives")
ax1.set_ylim([-0.05, 1.05])
ax1.legend(loc="lower right")
ax1.set_title('Calibration plots (reliability curve)')
ax2.set_xlabel("Mean predicted value")
ax2.set_ylabel("Count")
ax2.legend(loc="upper center", ncol=2)
plt.tight_layout()
plt.show()
draw_calibration_curve(y_test, y_prob_pred_xgb, n_bins=10)
Logistic regression
# Prediction
y_pred_xgb_test = xgb.predict(X_test)
y_prob_pred_xgb_test = xgb.predict_proba(X_test)[:, 1]
from sklearn.linear_model import LogisticRegression
X = np.array(y_prob_pred_xgb_test)
clf = LogisticRegression(random_state=0).fit(X.reshape(-1, 1), y_test)
y_prob_pred_calib = clf.predict_proba(X.reshape(-1, 1))[:, 1]
y_pred_calib = clf.predict(X.reshape(-1, 1))
captures(y_test, y_pred_calib, y_prob_pred_calib)
| prob_bin | not_fraud | fraud | perc_fraud | perc_not_fraud | cum_perc_fraud | cum_perc_not_fraud | |
|---|---|---|---|---|---|---|---|
| 0 | Bin1((0.0188, 0.998]) | 12835 | 4882 | 0.797582 | 0.075040 | 79.758209 | 7.504049 |
| 1 | Bin2((0.0144, 0.0188]) | 17204 | 512 | 0.083646 | 0.100584 | 88.122856 | 17.562456 |
| 2 | Bin3((0.0132, 0.0144]) | 17460 | 256 | 0.041823 | 0.102081 | 92.305179 | 27.770535 |
| 3 | Bin4((0.0127, 0.0132]) | 17572 | 144 | 0.023526 | 0.102736 | 94.657736 | 38.044095 |
| 4 | Bin5((0.0124, 0.0127]) | 17606 | 109 | 0.017808 | 0.102934 | 96.438490 | 48.337533 |
| 5 | Bin6((0.0121, 0.0124]) | 17649 | 68 | 0.011109 | 0.103186 | 97.549420 | 58.656112 |
| 6 | Bin7((0.0119, 0.0121]) | 17662 | 54 | 0.008822 | 0.103262 | 98.431629 | 68.982291 |
| 7 | Bin8((0.0118, 0.0119]) | 17668 | 48 | 0.007842 | 0.103297 | 99.215814 | 79.311978 |
| 8 | Bin9((0.0117, 0.0118]) | 17689 | 27 | 0.004411 | 0.103420 | 99.656919 | 89.653943 |
| 9 | Bin10((0.010499999999999999, 0.0117]) | 17696 | 21 | 0.003431 | 0.103461 | 100.000000 | 100.000000 |
draw_calibration_curve(y_test, y_prob_pred_calib, n_bins=10)
%%time
# Define the model
xgb = XGBClassifier(nthread=-1, random_state=0, booster="dart")
# Train the model
xgb.fit(X_train,y_train)
xgb
Wall time: 11min 20s
XGBClassifier(base_score=0.5, booster='dart', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
importance_type='gain', interaction_constraints='',
learning_rate=0.300000012, max_delta_step=0, max_depth=6,
min_child_weight=1, missing=nan, monotone_constraints='()',
n_estimators=100, n_jobs=-1, nthread=-1, num_parallel_tree=1,
random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
subsample=1, tree_method='exact', validate_parameters=1,
verbosity=None)
Let's use the model to get predictions on test dataset. We would be looking at the predicted class and predicted probability both in order to evaluate the performance of the model
# Prediction
y_pred_xgbdart = xgb.predict(X_test)
y_prob_pred_xgbdart = xgb.predict_proba(X_test)[:, 1]
print("Y predicted : ", y_pred_xgbdart)
print("Y probability predicted : ", y_prob_pred_xgbdart[:5])
Y predicted : [False False False ... False False False] Y probability predicted : [0.00088362 0.01282003 0.00395442 0.00810351 0.00213888]
Let's compute various evaluation metrices now
# Compute Evaluation Metric
compute_evaluation_metric(xgb, X_test, y_test, y_pred_xgbdart, y_prob_pred_xgbdart)
Accuracy Score : 0.9797247716778993
AUC Score : 0.9348031681385632
Confusion Matrix :
[[170693 348]
[ 3244 2877]]
Classification Report :
precision recall f1-score support
False 0.98 1.00 0.99 171041
True 0.89 0.47 0.62 6121
accuracy 0.98 177162
macro avg 0.94 0.73 0.80 177162
weighted avg 0.98 0.98 0.98 177162
Concordance Index : 0.9348031547662842
ROC curve :
PR curve :
# Gains Table and Capture rates plot
captures(y_test, y_pred_xgbdart, y_prob_pred_xgbdart)
| prob_bin | not_fraud | fraud | perc_fraud | perc_not_fraud | cum_perc_fraud | cum_perc_not_fraud | |
|---|---|---|---|---|---|---|---|
| 0 | Bin1((0.0472, 1.0]) | 12835 | 4882 | 0.797582 | 0.075040 | 79.758209 | 7.504049 |
| 1 | Bin2((0.0215, 0.0472]) | 17204 | 512 | 0.083646 | 0.100584 | 88.122856 | 17.562456 |
| 2 | Bin3((0.0132, 0.0215]) | 17460 | 256 | 0.041823 | 0.102081 | 92.305179 | 27.770535 |
| 3 | Bin4((0.00914, 0.0132]) | 17572 | 144 | 0.023526 | 0.102736 | 94.657736 | 38.044095 |
| 4 | Bin5((0.00659, 0.00914]) | 17606 | 109 | 0.017808 | 0.102934 | 96.438490 | 48.337533 |
| 5 | Bin6((0.00477, 0.00659]) | 17649 | 68 | 0.011109 | 0.103186 | 97.549420 | 58.656112 |
| 6 | Bin7((0.00333, 0.00477]) | 17662 | 54 | 0.008822 | 0.103262 | 98.431629 | 68.982291 |
| 7 | Bin8((0.00219, 0.00333]) | 17668 | 48 | 0.007842 | 0.103297 | 99.215814 | 79.311978 |
| 8 | Bin9((0.00121, 0.00219]) | 17689 | 27 | 0.004411 | 0.103420 | 99.656919 | 89.653943 |
| 9 | Bin10((-0.0009859, 0.00121]) | 17696 | 21 | 0.003431 | 0.103461 | 100.000000 | 100.000000 |
draw_calibration_curve(y_test, y_prob_pred_xgbdart, n_bins=10)
Let's look at LGBM
LightGBM is a gradient boosting framework that uses tree based learning algorithms.
It is designed to be distributed and efficient with the following advantages:
from lightgbm import LGBMClassifier
%%time
# Define the model
lgbc = LGBMClassifier(random_state=0, n_jobs = -1)
# Train the model
lgbc.fit(X_train,y_train)
lgbc
Wall time: 1min 16s
LGBMClassifier(random_state=0)
Let's use the model to get predictions on test dataset. We would be looking at the predicted class and predicted probability both in order to evaluate the performance of the model
# Prediction
y_pred_lgbc = lgbc.predict(X_test)
y_prob_pred_lgbc = lgbc.predict_proba(X_test)
y_prob_pred_lgbc = [x[1] for x in y_prob_pred_lgbc]
print("Y predicted : ",y_pred_lgbc)
print("Y probability predicted : ",y_prob_pred_lgbc[:5])
Y predicted : [False False False ... False False False] Y probability predicted : [0.0030038692434659836, 0.0204190888379695, 0.017504758484994488, 0.0071321007747928164, 0.0018421571426377808]
Let's compute various evaluation metrices now
# Compute Evaluation Metric
compute_evaluation_metric(lgbc, X_test, y_test, y_pred_lgbc, y_prob_pred_lgbc)
Accuracy Score : 0.9772355245481537
AUC Score : 0.9276554552960554
Confusion Matrix :
[[170633 408]
[ 3625 2496]]
Classification Report :
precision recall f1-score support
False 0.98 1.00 0.99 171041
True 0.86 0.41 0.55 6121
accuracy 0.98 177162
macro avg 0.92 0.70 0.77 177162
weighted avg 0.98 0.98 0.97 177162
Concordance Index : 0.9276554118361485
ROC curve :
PR curve :
# Gains Table and Capture rates plot
captures(y_test, y_pred_lgbc, y_prob_pred_lgbc)
| prob_bin | not_fraud | fraud | perc_fraud | perc_not_fraud | cum_perc_fraud | cum_perc_not_fraud | |
|---|---|---|---|---|---|---|---|
| 0 | Bin1((0.0529, 0.996]) | 12911 | 4806 | 0.785166 | 0.075485 | 78.516582 | 7.548483 |
| 1 | Bin2((0.0253, 0.0529]) | 17203 | 513 | 0.083810 | 0.100578 | 86.897566 | 17.606305 |
| 2 | Bin3((0.0157, 0.0253]) | 17447 | 269 | 0.043947 | 0.102005 | 91.292273 | 27.806783 |
| 3 | Bin4((0.0111, 0.0157]) | 17549 | 167 | 0.027283 | 0.102601 | 94.020585 | 38.066896 |
| 4 | Bin5((0.00852, 0.0111]) | 17589 | 127 | 0.020748 | 0.102835 | 96.095409 | 48.350396 |
| 5 | Bin6((0.00665, 0.00852]) | 17629 | 87 | 0.014213 | 0.103069 | 97.516746 | 58.657281 |
| 6 | Bin7((0.00507, 0.00665]) | 17669 | 47 | 0.007678 | 0.103303 | 98.284594 | 68.987553 |
| 7 | Bin8((0.00374, 0.00507]) | 17673 | 43 | 0.007025 | 0.103326 | 98.987094 | 79.320163 |
| 8 | Bin9((0.00266, 0.00374]) | 17682 | 34 | 0.005555 | 0.103379 | 99.542558 | 89.658035 |
| 9 | Bin10((-0.000701, 0.00266]) | 17689 | 28 | 0.004574 | 0.103420 | 100.000000 | 100.000000 |
draw_calibration_curve(y_test, y_prob_pred_lgbc, n_bins=10)
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, GradientBoostingClassifier
X_train.head()
| TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | id_01 | id_02 | id_03 | id_04 | id_05 | id_06 | id_09 | id_10 | id_11 | id_12 | id_13 | id_14 | id_15 | id_16 | id_17 | id_19 | id_20 | id_28 | id_29 | id_30 | id_31 | id_32 | id_33 | id_34 | id_35 | id_36 | id_37 | id_38 | DeviceType | DeviceInfo | card2_missing_flag | card3_missing_flag | card4_missing_flag | card5_missing_flag | card6_missing_flag | addr1_missing_flag | addr2_missing_flag | dist1_missing_flag | dist2_missing_flag | P_emaildomain_missing_flag | R_emaildomain_missing_flag | D1_missing_flag | D2_missing_flag | D3_missing_flag | D4_missing_flag | D5_missing_flag | D6_missing_flag | D7_missing_flag | D8_missing_flag | D9_missing_flag | D10_missing_flag | D11_missing_flag | D12_missing_flag | D13_missing_flag | D14_missing_flag | D15_missing_flag | M1_missing_flag | M2_missing_flag | M3_missing_flag | M4_missing_flag | M5_missing_flag | M6_missing_flag | M7_missing_flag | M8_missing_flag | M9_missing_flag | V1_missing_flag | V2_missing_flag | V3_missing_flag | V4_missing_flag | V5_missing_flag | V6_missing_flag | V7_missing_flag | V8_missing_flag | V9_missing_flag | V10_missing_flag | V11_missing_flag | V12_missing_flag | V13_missing_flag | V14_missing_flag | V15_missing_flag | V16_missing_flag | V17_missing_flag | V18_missing_flag | V19_missing_flag | V20_missing_flag | V21_missing_flag | V22_missing_flag | V23_missing_flag | V24_missing_flag | V25_missing_flag | V26_missing_flag | V27_missing_flag | V28_missing_flag | V29_missing_flag | V30_missing_flag | V31_missing_flag | V32_missing_flag | V33_missing_flag | V34_missing_flag | V35_missing_flag | V36_missing_flag | V37_missing_flag | V38_missing_flag | V39_missing_flag | V40_missing_flag | V41_missing_flag | V42_missing_flag | V43_missing_flag | V44_missing_flag | V45_missing_flag | V46_missing_flag | V47_missing_flag | V48_missing_flag | V49_missing_flag | V50_missing_flag | V51_missing_flag | V52_missing_flag | V53_missing_flag | V54_missing_flag | V55_missing_flag | V56_missing_flag | V57_missing_flag | V58_missing_flag | V59_missing_flag | V60_missing_flag | V61_missing_flag | V62_missing_flag | V63_missing_flag | V64_missing_flag | V65_missing_flag | V66_missing_flag | V67_missing_flag | V68_missing_flag | V69_missing_flag | V70_missing_flag | V71_missing_flag | V72_missing_flag | V73_missing_flag | V74_missing_flag | V75_missing_flag | V76_missing_flag | V77_missing_flag | V78_missing_flag | V79_missing_flag | V80_missing_flag | V81_missing_flag | V82_missing_flag | V83_missing_flag | V84_missing_flag | V85_missing_flag | V86_missing_flag | V87_missing_flag | V88_missing_flag | V89_missing_flag | V90_missing_flag | V91_missing_flag | V92_missing_flag | V93_missing_flag | V94_missing_flag | V95_missing_flag | V96_missing_flag | V97_missing_flag | V98_missing_flag | V99_missing_flag | V100_missing_flag | V101_missing_flag | V102_missing_flag | V103_missing_flag | V104_missing_flag | V105_missing_flag | V106_missing_flag | V107_missing_flag | V108_missing_flag | V109_missing_flag | V110_missing_flag | V111_missing_flag | V112_missing_flag | V113_missing_flag | V114_missing_flag | V115_missing_flag | V116_missing_flag | V117_missing_flag | V118_missing_flag | V119_missing_flag | V120_missing_flag | V121_missing_flag | V122_missing_flag | V123_missing_flag | V124_missing_flag | V125_missing_flag | V126_missing_flag | V127_missing_flag | V128_missing_flag | V129_missing_flag | V130_missing_flag | V131_missing_flag | V132_missing_flag | V133_missing_flag | V134_missing_flag | V135_missing_flag | ... | V169_missing_flag | V170_missing_flag | V171_missing_flag | V172_missing_flag | V173_missing_flag | V174_missing_flag | V175_missing_flag | V176_missing_flag | V177_missing_flag | V178_missing_flag | V179_missing_flag | V180_missing_flag | V181_missing_flag | V182_missing_flag | V183_missing_flag | V184_missing_flag | V185_missing_flag | V186_missing_flag | V187_missing_flag | V188_missing_flag | V189_missing_flag | V190_missing_flag | V191_missing_flag | V192_missing_flag | V193_missing_flag | V194_missing_flag | V195_missing_flag | V196_missing_flag | V197_missing_flag | V198_missing_flag | V199_missing_flag | V200_missing_flag | V201_missing_flag | V202_missing_flag | V203_missing_flag | V204_missing_flag | V205_missing_flag | V206_missing_flag | V207_missing_flag | V208_missing_flag | V209_missing_flag | V210_missing_flag | V211_missing_flag | V212_missing_flag | V213_missing_flag | V214_missing_flag | V215_missing_flag | V216_missing_flag | V217_missing_flag | V218_missing_flag | V219_missing_flag | V220_missing_flag | V221_missing_flag | V222_missing_flag | V223_missing_flag | V224_missing_flag | V225_missing_flag | V226_missing_flag | V227_missing_flag | V228_missing_flag | V229_missing_flag | V230_missing_flag | V231_missing_flag | V232_missing_flag | V233_missing_flag | V234_missing_flag | V235_missing_flag | V236_missing_flag | V237_missing_flag | V238_missing_flag | V239_missing_flag | V240_missing_flag | V241_missing_flag | V242_missing_flag | V243_missing_flag | V244_missing_flag | V245_missing_flag | V246_missing_flag | V247_missing_flag | V248_missing_flag | V249_missing_flag | V250_missing_flag | V251_missing_flag | V252_missing_flag | V253_missing_flag | V254_missing_flag | V255_missing_flag | V256_missing_flag | V257_missing_flag | V258_missing_flag | V259_missing_flag | V260_missing_flag | V261_missing_flag | V262_missing_flag | V263_missing_flag | V264_missing_flag | V265_missing_flag | V266_missing_flag | V267_missing_flag | V268_missing_flag | V269_missing_flag | V270_missing_flag | V271_missing_flag | V272_missing_flag | V273_missing_flag | V274_missing_flag | V275_missing_flag | V276_missing_flag | V277_missing_flag | V278_missing_flag | V279_missing_flag | V280_missing_flag | V281_missing_flag | V282_missing_flag | V283_missing_flag | V284_missing_flag | V285_missing_flag | V286_missing_flag | V287_missing_flag | V288_missing_flag | V289_missing_flag | V290_missing_flag | V291_missing_flag | V292_missing_flag | V293_missing_flag | V294_missing_flag | V295_missing_flag | V296_missing_flag | V297_missing_flag | V298_missing_flag | V299_missing_flag | V300_missing_flag | V301_missing_flag | V302_missing_flag | V303_missing_flag | V304_missing_flag | V305_missing_flag | V306_missing_flag | V307_missing_flag | V308_missing_flag | V309_missing_flag | V310_missing_flag | V311_missing_flag | V312_missing_flag | V313_missing_flag | V314_missing_flag | V315_missing_flag | V316_missing_flag | V317_missing_flag | V318_missing_flag | V319_missing_flag | V320_missing_flag | V321_missing_flag | V322_missing_flag | V323_missing_flag | V324_missing_flag | V325_missing_flag | V326_missing_flag | V327_missing_flag | V328_missing_flag | V329_missing_flag | V330_missing_flag | V331_missing_flag | V332_missing_flag | V333_missing_flag | V334_missing_flag | V335_missing_flag | V336_missing_flag | V337_missing_flag | V338_missing_flag | V339_missing_flag | id_01_missing_flag | id_02_missing_flag | id_03_missing_flag | id_04_missing_flag | id_05_missing_flag | id_06_missing_flag | id_07_missing_flag | id_08_missing_flag | id_09_missing_flag | id_10_missing_flag | id_11_missing_flag | id_12_missing_flag | id_13_missing_flag | id_14_missing_flag | id_15_missing_flag | id_16_missing_flag | id_17_missing_flag | id_18_missing_flag | id_19_missing_flag | id_20_missing_flag | id_21_missing_flag | id_22_missing_flag | id_23_missing_flag | id_24_missing_flag | id_25_missing_flag | id_26_missing_flag | id_27_missing_flag | id_28_missing_flag | id_29_missing_flag | id_30_missing_flag | id_31_missing_flag | id_32_missing_flag | id_33_missing_flag | id_34_missing_flag | id_35_missing_flag | id_36_missing_flag | id_37_missing_flag | id_38_missing_flag | DeviceType_missing_flag | DeviceInfo_missing_flag | _Weekdays | _Hours | _Days | Trans_min_mean | Trans_min_std | TransactionAmt_to_mean_card1 | TransactionAmt_to_mean_card4 | TransactionAmt_to_std_card1 | TransactionAmt_to_std_card4 | PCA_V_0 | PCA_V_1 | PCA_V_2 | PCA_V_3 | PCA_V_4 | PCA_V_5 | PCA_V_6 | PCA_V_7 | PCA_V_8 | PCA_V_9 | PCA_V_10 | PCA_V_11 | PCA_V_12 | PCA_V_13 | PCA_V_14 | PCA_V_15 | PCA_V_16 | PCA_V_17 | PCA_V_18 | PCA_V_19 | PCA_V_20 | PCA_V_21 | PCA_V_22 | PCA_V_23 | PCA_V_24 | PCA_V_25 | PCA_V_26 | PCA_V_27 | PCA_V_28 | PCA_V_29 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 464452 | 3.432 | 4 | 15063 | 514.0 | 150.0 | 4 | 226.0 | 1 | 315.0 | 87.0 | 18.0 | 0 | 2 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 2.0 | 1.0 | 35.0 | 35.0 | 34.0 | 34.0 | 34.0 | NaN | NaN | NaN | 34.0 | 34.0 | NaN | NaN | NaN | 34.0 | 1 | 1 | 1 | 3 | 2 | 1 | 0 | 0 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | NaN | NaN | 3 | 2 | NaN | NaN | NaN | 2 | 2 | 3 | 4 | NaN | NaN | 4 | 2 | 2 | 2 | 2 | 2 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 4 | 18 | -104.06 | -0.4350 | 0.1501 | 0.2324 | 0.092611 | 0.13560 | -0.9330 | 0.3564 | 0.2969 | -0.05145 | 0.012596 | 0.01012 | 0.02788 | 0.1576 | -0.053300 | 0.13940 | 0.033940 | 0.021680 | -0.04034 | 0.02574 | -0.022110 | -0.003386 | 0.019290 | 0.013780 | -0.007637 | -0.031980 | 0.012856 | -0.007412 | 0.000978 | -0.016700 | -0.007600 | -0.027450 | 0.00541 | -0.006160 | 0.005920 | 0.008170 |
| 36372 | 3.717 | 0 | 6019 | 583.0 | 150.0 | 4 | 226.0 | 1 | NaN | NaN | NaN | 0 | 0 | 928.0 | 1301.0 | 0.0 | 475.0 | 0.0 | 475.0 | 476.0 | 688.0 | 0.0 | 823.0 | 667.0 | 667.0 | 695.0 | 300.0 | 0.0 | NaN | NaN | 0.0 | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 302.0 | 0.0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | -5.0 | 49333.0 | 0.0 | 0.0 | 0.0 | -1.0 | 0.0 | 0.0 | 100.0 | 1 | 52.0 | -360.0 | 0 | 0 | 166.0 | 410.0 | 611.0 | 0 | 0 | 4 | 0 | 24.0 | 0.02858 | 3 | 1 | 0 | 1 | 0 | 0 | 0.0808 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 11 | -93.94 | -0.3928 | 0.1823 | 0.3086 | 0.103778 | 0.18000 | 0.6445 | -0.9595 | 0.6094 | 0.50200 | -0.377700 | -0.15760 | 0.07880 | 0.1742 | 1.205000 | 0.12660 | -0.007683 | -0.130900 | -0.04294 | 0.00345 | 0.057530 | 0.048770 | -0.015470 | -0.016300 | 0.073850 | 0.016130 | -0.018260 | 0.036070 | -0.019200 | -0.050660 | -0.045530 | -0.032350 | -0.02182 | 0.038240 | -0.009430 | 0.004864 |
| 572387 | 5.527 | 4 | 17188 | 321.0 | 150.0 | 4 | 226.0 | 2 | 299.0 | 87.0 | 1.0 | 6 | 2 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 2.0 | 0.0 | 1.0 | 0.0 | 14.0 | 1.0 | 568.0 | 80.0 | 1.0 | 569.0 | 1.0 | NaN | NaN | NaN | 569.0 | 569.0 | NaN | NaN | NaN | 569.0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | NaN | NaN | 3 | 2 | NaN | NaN | NaN | 2 | 2 | 3 | 4 | NaN | NaN | 4 | 2 | 2 | 2 | 2 | 2 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 4 | 21 | 25 | 116.00 | 0.4849 | 1.9370 | 1.8850 | 1.348902 | 1.10000 | -0.8013 | 0.3174 | 0.2720 | -0.02687 | 0.049070 | -0.00634 | -0.02972 | -0.2195 | 0.019070 | 0.03270 | -0.009254 | -0.009834 | 0.02269 | -0.02753 | -0.009140 | -0.002127 | -0.001934 | -0.025130 | 0.016200 | 0.040620 | 0.000128 | 0.001646 | -0.004090 | 0.010020 | 0.002650 | 0.010440 | 0.00992 | 0.005665 | -0.008720 | -0.011665 |
| 497276 | 3.950 | 4 | 1214 | 174.0 | 150.0 | 4 | 226.0 | 1 | 204.0 | 87.0 | NaN | 4 | 2 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | 0.0 | NaN | NaN | NaN | 0.0 | 1 | 1 | 1 | 3 | 2 | 0 | 0 | 0 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | NaN | NaN | 3 | 2 | NaN | NaN | NaN | 2 | 2 | 3 | 4 | NaN | NaN | 4 | 2 | 2 | 2 | 2 | 2 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 19 | 30 | -83.06 | -0.3474 | 0.2540 | 0.3901 | 0.147945 | 0.22770 | -0.9326 | 0.3564 | 0.2974 | -0.05154 | 0.010610 | 0.00979 | 0.02590 | 0.1611 | -0.054500 | 0.13270 | 0.035740 | -0.010574 | -0.00642 | -0.03049 | -0.012240 | 0.017960 | 0.018100 | 0.002857 | -0.016740 | 0.007100 | 0.018840 | -0.006878 | 0.008800 | -0.004406 | 0.000245 | -0.014570 | -0.00884 | -0.011110 | -0.000597 | -0.007458 |
| 97470 | 3.219 | 1 | 12695 | 490.0 | 150.0 | 4 | 226.0 | 2 | 325.0 | 87.0 | NaN | 6 | 6 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | -5.0 | 68281.0 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | 100.0 | 1 | 52.0 | NaN | 1 | 1 | 225.0 | 266.0 | 507.0 | 1 | 1 | 3 | 0 | NaN | NaN | 4 | 0 | 0 | 1 | 0 | 0 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 5 | 21 | 23 | -110.00 | -0.4600 | 0.1771 | 0.1877 | 0.116133 | 0.10956 | 2.8070 | 0.3457 | 0.3772 | -0.01767 | -0.042420 | -0.24430 | 0.02310 | 0.0118 | 0.000914 | 0.02304 | -0.009550 | -0.022300 | -0.02167 | -0.01366 | -0.005627 | -0.009690 | 0.001454 | -0.008240 | -0.008900 | 0.005936 | -0.004475 | -0.000716 | -0.003060 | 0.008514 | -0.017840 | 0.000079 | 0.01335 | -0.005287 | -0.000648 | 0.007446 |
5 rows × 533 columns
Impute Missing values. Since sklearn algos are not designed to handle missing values.
from sklearn.impute import KNNImputer, SimpleImputer
# replace inf
X_train = X_train.replace([np.inf, -np.inf], np.nan)
X_test = X_test.replace([np.inf, -np.inf], np.nan)
# Impute
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
# imputer = KNNImputer(n_neighbors=3)
X_train_imputed = imputer.fit_transform(X_train)
X_train_imputed = pd.DataFrame(X_train_imputed, columns=X_train.columns)
X_train_imputed.head()
| TransactionAmt | ProductCD | card1 | card2 | card3 | card4 | card5 | card6 | addr1 | addr2 | dist1 | P_emaildomain | R_emaildomain | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | D1 | D2 | D3 | D4 | D5 | D6 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | id_01 | id_02 | id_03 | id_04 | id_05 | id_06 | id_09 | id_10 | id_11 | id_12 | id_13 | id_14 | id_15 | id_16 | id_17 | id_19 | id_20 | id_28 | id_29 | id_30 | id_31 | id_32 | id_33 | id_34 | id_35 | id_36 | id_37 | id_38 | DeviceType | DeviceInfo | card2_missing_flag | card3_missing_flag | card4_missing_flag | card5_missing_flag | card6_missing_flag | addr1_missing_flag | addr2_missing_flag | dist1_missing_flag | dist2_missing_flag | P_emaildomain_missing_flag | R_emaildomain_missing_flag | D1_missing_flag | D2_missing_flag | D3_missing_flag | D4_missing_flag | D5_missing_flag | D6_missing_flag | D7_missing_flag | D8_missing_flag | D9_missing_flag | D10_missing_flag | D11_missing_flag | D12_missing_flag | D13_missing_flag | D14_missing_flag | D15_missing_flag | M1_missing_flag | M2_missing_flag | M3_missing_flag | M4_missing_flag | M5_missing_flag | M6_missing_flag | M7_missing_flag | M8_missing_flag | M9_missing_flag | V1_missing_flag | V2_missing_flag | V3_missing_flag | V4_missing_flag | V5_missing_flag | V6_missing_flag | V7_missing_flag | V8_missing_flag | V9_missing_flag | V10_missing_flag | V11_missing_flag | V12_missing_flag | V13_missing_flag | V14_missing_flag | V15_missing_flag | V16_missing_flag | V17_missing_flag | V18_missing_flag | V19_missing_flag | V20_missing_flag | V21_missing_flag | V22_missing_flag | V23_missing_flag | V24_missing_flag | V25_missing_flag | V26_missing_flag | V27_missing_flag | V28_missing_flag | V29_missing_flag | V30_missing_flag | V31_missing_flag | V32_missing_flag | V33_missing_flag | V34_missing_flag | V35_missing_flag | V36_missing_flag | V37_missing_flag | V38_missing_flag | V39_missing_flag | V40_missing_flag | V41_missing_flag | V42_missing_flag | V43_missing_flag | V44_missing_flag | V45_missing_flag | V46_missing_flag | V47_missing_flag | V48_missing_flag | V49_missing_flag | V50_missing_flag | V51_missing_flag | V52_missing_flag | V53_missing_flag | V54_missing_flag | V55_missing_flag | V56_missing_flag | V57_missing_flag | V58_missing_flag | V59_missing_flag | V60_missing_flag | V61_missing_flag | V62_missing_flag | V63_missing_flag | V64_missing_flag | V65_missing_flag | V66_missing_flag | V67_missing_flag | V68_missing_flag | V69_missing_flag | V70_missing_flag | V71_missing_flag | V72_missing_flag | V73_missing_flag | V74_missing_flag | V75_missing_flag | V76_missing_flag | V77_missing_flag | V78_missing_flag | V79_missing_flag | V80_missing_flag | V81_missing_flag | V82_missing_flag | V83_missing_flag | V84_missing_flag | V85_missing_flag | V86_missing_flag | V87_missing_flag | V88_missing_flag | V89_missing_flag | V90_missing_flag | V91_missing_flag | V92_missing_flag | V93_missing_flag | V94_missing_flag | V95_missing_flag | V96_missing_flag | V97_missing_flag | V98_missing_flag | V99_missing_flag | V100_missing_flag | V101_missing_flag | V102_missing_flag | V103_missing_flag | V104_missing_flag | V105_missing_flag | V106_missing_flag | V107_missing_flag | V108_missing_flag | V109_missing_flag | V110_missing_flag | V111_missing_flag | V112_missing_flag | V113_missing_flag | V114_missing_flag | V115_missing_flag | V116_missing_flag | V117_missing_flag | V118_missing_flag | V119_missing_flag | V120_missing_flag | V121_missing_flag | V122_missing_flag | V123_missing_flag | V124_missing_flag | V125_missing_flag | V126_missing_flag | V127_missing_flag | V128_missing_flag | V129_missing_flag | V130_missing_flag | V131_missing_flag | V132_missing_flag | V133_missing_flag | V134_missing_flag | V135_missing_flag | ... | V169_missing_flag | V170_missing_flag | V171_missing_flag | V172_missing_flag | V173_missing_flag | V174_missing_flag | V175_missing_flag | V176_missing_flag | V177_missing_flag | V178_missing_flag | V179_missing_flag | V180_missing_flag | V181_missing_flag | V182_missing_flag | V183_missing_flag | V184_missing_flag | V185_missing_flag | V186_missing_flag | V187_missing_flag | V188_missing_flag | V189_missing_flag | V190_missing_flag | V191_missing_flag | V192_missing_flag | V193_missing_flag | V194_missing_flag | V195_missing_flag | V196_missing_flag | V197_missing_flag | V198_missing_flag | V199_missing_flag | V200_missing_flag | V201_missing_flag | V202_missing_flag | V203_missing_flag | V204_missing_flag | V205_missing_flag | V206_missing_flag | V207_missing_flag | V208_missing_flag | V209_missing_flag | V210_missing_flag | V211_missing_flag | V212_missing_flag | V213_missing_flag | V214_missing_flag | V215_missing_flag | V216_missing_flag | V217_missing_flag | V218_missing_flag | V219_missing_flag | V220_missing_flag | V221_missing_flag | V222_missing_flag | V223_missing_flag | V224_missing_flag | V225_missing_flag | V226_missing_flag | V227_missing_flag | V228_missing_flag | V229_missing_flag | V230_missing_flag | V231_missing_flag | V232_missing_flag | V233_missing_flag | V234_missing_flag | V235_missing_flag | V236_missing_flag | V237_missing_flag | V238_missing_flag | V239_missing_flag | V240_missing_flag | V241_missing_flag | V242_missing_flag | V243_missing_flag | V244_missing_flag | V245_missing_flag | V246_missing_flag | V247_missing_flag | V248_missing_flag | V249_missing_flag | V250_missing_flag | V251_missing_flag | V252_missing_flag | V253_missing_flag | V254_missing_flag | V255_missing_flag | V256_missing_flag | V257_missing_flag | V258_missing_flag | V259_missing_flag | V260_missing_flag | V261_missing_flag | V262_missing_flag | V263_missing_flag | V264_missing_flag | V265_missing_flag | V266_missing_flag | V267_missing_flag | V268_missing_flag | V269_missing_flag | V270_missing_flag | V271_missing_flag | V272_missing_flag | V273_missing_flag | V274_missing_flag | V275_missing_flag | V276_missing_flag | V277_missing_flag | V278_missing_flag | V279_missing_flag | V280_missing_flag | V281_missing_flag | V282_missing_flag | V283_missing_flag | V284_missing_flag | V285_missing_flag | V286_missing_flag | V287_missing_flag | V288_missing_flag | V289_missing_flag | V290_missing_flag | V291_missing_flag | V292_missing_flag | V293_missing_flag | V294_missing_flag | V295_missing_flag | V296_missing_flag | V297_missing_flag | V298_missing_flag | V299_missing_flag | V300_missing_flag | V301_missing_flag | V302_missing_flag | V303_missing_flag | V304_missing_flag | V305_missing_flag | V306_missing_flag | V307_missing_flag | V308_missing_flag | V309_missing_flag | V310_missing_flag | V311_missing_flag | V312_missing_flag | V313_missing_flag | V314_missing_flag | V315_missing_flag | V316_missing_flag | V317_missing_flag | V318_missing_flag | V319_missing_flag | V320_missing_flag | V321_missing_flag | V322_missing_flag | V323_missing_flag | V324_missing_flag | V325_missing_flag | V326_missing_flag | V327_missing_flag | V328_missing_flag | V329_missing_flag | V330_missing_flag | V331_missing_flag | V332_missing_flag | V333_missing_flag | V334_missing_flag | V335_missing_flag | V336_missing_flag | V337_missing_flag | V338_missing_flag | V339_missing_flag | id_01_missing_flag | id_02_missing_flag | id_03_missing_flag | id_04_missing_flag | id_05_missing_flag | id_06_missing_flag | id_07_missing_flag | id_08_missing_flag | id_09_missing_flag | id_10_missing_flag | id_11_missing_flag | id_12_missing_flag | id_13_missing_flag | id_14_missing_flag | id_15_missing_flag | id_16_missing_flag | id_17_missing_flag | id_18_missing_flag | id_19_missing_flag | id_20_missing_flag | id_21_missing_flag | id_22_missing_flag | id_23_missing_flag | id_24_missing_flag | id_25_missing_flag | id_26_missing_flag | id_27_missing_flag | id_28_missing_flag | id_29_missing_flag | id_30_missing_flag | id_31_missing_flag | id_32_missing_flag | id_33_missing_flag | id_34_missing_flag | id_35_missing_flag | id_36_missing_flag | id_37_missing_flag | id_38_missing_flag | DeviceType_missing_flag | DeviceInfo_missing_flag | _Weekdays | _Hours | _Days | Trans_min_mean | Trans_min_std | TransactionAmt_to_mean_card1 | TransactionAmt_to_mean_card4 | TransactionAmt_to_std_card1 | TransactionAmt_to_std_card4 | PCA_V_0 | PCA_V_1 | PCA_V_2 | PCA_V_3 | PCA_V_4 | PCA_V_5 | PCA_V_6 | PCA_V_7 | PCA_V_8 | PCA_V_9 | PCA_V_10 | PCA_V_11 | PCA_V_12 | PCA_V_13 | PCA_V_14 | PCA_V_15 | PCA_V_16 | PCA_V_17 | PCA_V_18 | PCA_V_19 | PCA_V_20 | PCA_V_21 | PCA_V_22 | PCA_V_23 | PCA_V_24 | PCA_V_25 | PCA_V_26 | PCA_V_27 | PCA_V_28 | PCA_V_29 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3.432 | 4.0 | 15063.0 | 514.0 | 150.0 | 4.0 | 226.0 | 1.0 | 315.00000 | 87.000000 | 18.00000 | 0.0 | 2.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 2.0 | 1.0 | 35.0 | 35.000000 | 34.000000 | 34.000000 | 34.000000 | 69.466886 | 145.896923 | 0.562258 | 34.000000 | 34.00000 | 53.870766 | 17.999492 | 57.910973 | 34.000000 | 1.0 | 1.0 | 1.0 | 3.0 | 2.0 | 1.0 | 0.0 | 0.0 | 1.0 | -10.167535 | 175100.840652 | 0.063532 | -0.055814 | 1.628387 | -6.764911 | 0.095223 | -0.30179 | 99.745253 | 2.0 | 48.029567 | -344.788868 | 3.0 | 2.0 | 189.486242 | 352.986677 | 403.654199 | 2.0 | 2.0 | 3.0 | 4.0 | 26.519707 | 0.011556 | 4.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 0.041219 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 4.0 | 18.0 | -104.06 | -0.4350 | 0.1501 | 0.2324 | 0.092611 | 0.13560 | -0.9330 | 0.3564 | 0.2969 | -0.05145 | 0.012596 | 0.01012 | 0.02788 | 0.1576 | -0.053300 | 0.13940 | 0.033940 | 0.021680 | -0.04034 | 0.02574 | -0.022110 | -0.003386 | 0.019290 | 0.013780 | -0.007637 | -0.031980 | 0.012856 | -0.007412 | 0.000978 | -0.016700 | -0.007600 | -0.027450 | 0.00541 | -0.006160 | 0.005920 | 0.008170 |
| 1 | 3.717 | 0.0 | 6019.0 | 583.0 | 150.0 | 4.0 | 226.0 | 1.0 | 290.73683 | 86.796065 | 119.26547 | 0.0 | 0.0 | 928.0 | 1301.0 | 0.0 | 475.0 | 0.0 | 475.0 | 476.0 | 688.0 | 0.0 | 823.0 | 667.0 | 667.0 | 695.0 | 300.0 | 0.0 | 169.661838 | 28.353969 | 0.000000 | 42.323807 | 0.000000 | 145.896923 | 0.562258 | 123.914942 | 146.43237 | 0.000000 | 0.000000 | 302.000000 | 0.000000 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | -5.000000 | 49333.000000 | 0.000000 | 0.000000 | 0.000000 | -1.000000 | 0.000000 | 0.00000 | 100.000000 | 1.0 | 52.000000 | -360.000000 | 0.0 | 0.0 | 166.000000 | 410.000000 | 611.000000 | 0.0 | 0.0 | 4.0 | 0.0 | 24.000000 | 0.028580 | 3.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.080800 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 11.0 | -93.94 | -0.3928 | 0.1823 | 0.3086 | 0.103778 | 0.18000 | 0.6445 | -0.9595 | 0.6094 | 0.50200 | -0.377700 | -0.15760 | 0.07880 | 0.1742 | 1.205000 | 0.12660 | -0.007683 | -0.130900 | -0.04294 | 0.00345 | 0.057530 | 0.048770 | -0.015470 | -0.016300 | 0.073850 | 0.016130 | -0.018260 | 0.036070 | -0.019200 | -0.050660 | -0.045530 | -0.032350 | -0.02182 | 0.038240 | -0.009430 | 0.004864 |
| 2 | 5.527 | 4.0 | 17188.0 | 321.0 | 150.0 | 4.0 | 226.0 | 2.0 | 299.00000 | 87.000000 | 1.00000 | 6.0 | 2.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 2.0 | 0.0 | 1.0 | 0.0 | 14.0 | 1.0 | 568.0 | 80.000000 | 1.000000 | 569.000000 | 1.000000 | 69.466886 | 145.896923 | 0.562258 | 569.000000 | 569.00000 | 53.870766 | 17.999492 | 57.910973 | 569.000000 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | -10.167535 | 175100.840652 | 0.063532 | -0.055814 | 1.628387 | -6.764911 | 0.095223 | -0.30179 | 99.745253 | 2.0 | 48.029567 | -344.788868 | 3.0 | 2.0 | 189.486242 | 352.986677 | 403.654199 | 2.0 | 2.0 | 3.0 | 4.0 | 26.519707 | 0.011556 | 4.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 0.041219 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 4.0 | 21.0 | 25.0 | 116.00 | 0.4849 | 1.9370 | 1.8850 | 1.348902 | 1.10000 | -0.8013 | 0.3174 | 0.2720 | -0.02687 | 0.049070 | -0.00634 | -0.02972 | -0.2195 | 0.019070 | 0.03270 | -0.009254 | -0.009834 | 0.02269 | -0.02753 | -0.009140 | -0.002127 | -0.001934 | -0.025130 | 0.016200 | 0.040620 | 0.000128 | 0.001646 | -0.004090 | 0.010020 | 0.002650 | 0.010440 | 0.00992 | 0.005665 | -0.008720 | -0.011665 |
| 3 | 3.950 | 4.0 | 1214.0 | 174.0 | 150.0 | 4.0 | 226.0 | 1.0 | 204.00000 | 87.000000 | 119.26547 | 4.0 | 2.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 169.661838 | 28.353969 | 0.000000 | 42.323807 | 69.466886 | 145.896923 | 0.562258 | 0.000000 | 0.00000 | 53.870766 | 17.999492 | 57.910973 | 0.000000 | 1.0 | 1.0 | 1.0 | 3.0 | 2.0 | 0.0 | 0.0 | 0.0 | 1.0 | -10.167535 | 175100.840652 | 0.063532 | -0.055814 | 1.628387 | -6.764911 | 0.095223 | -0.30179 | 99.745253 | 2.0 | 48.029567 | -344.788868 | 3.0 | 2.0 | 189.486242 | 352.986677 | 403.654199 | 2.0 | 2.0 | 3.0 | 4.0 | 26.519707 | 0.011556 | 4.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 0.041219 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 19.0 | 30.0 | -83.06 | -0.3474 | 0.2540 | 0.3901 | 0.147945 | 0.22770 | -0.9326 | 0.3564 | 0.2974 | -0.05154 | 0.010610 | 0.00979 | 0.02590 | 0.1611 | -0.054500 | 0.13270 | 0.035740 | -0.010574 | -0.00642 | -0.03049 | -0.012240 | 0.017960 | 0.018100 | 0.002857 | -0.016740 | 0.007100 | 0.018840 | -0.006878 | 0.008800 | -0.004406 | 0.000245 | -0.014570 | -0.00884 | -0.011110 | -0.000597 | -0.007458 |
| 4 | 3.219 | 1.0 | 12695.0 | 490.0 | 150.0 | 4.0 | 226.0 | 2.0 | 325.00000 | 87.000000 | 119.26547 | 6.0 | 6.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 169.661838 | 28.353969 | 139.771485 | 42.323807 | 69.466886 | 145.896923 | 0.562258 | 123.914942 | 146.43237 | 53.870766 | 17.999492 | 57.910973 | 163.670779 | 2.0 | 2.0 | 2.0 | 3.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | -5.000000 | 68281.000000 | 0.063532 | -0.055814 | 0.000000 | 0.000000 | 0.095223 | -0.30179 | 100.000000 | 1.0 | 52.000000 | -344.788868 | 1.0 | 1.0 | 225.000000 | 266.000000 | 507.000000 | 1.0 | 1.0 | 3.0 | 0.0 | 26.519707 | 0.011556 | 4.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.041219 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 5.0 | 21.0 | 23.0 | -110.00 | -0.4600 | 0.1771 | 0.1877 | 0.116133 | 0.10956 | 2.8070 | 0.3457 | 0.3772 | -0.01767 | -0.042420 | -0.24430 | 0.02310 | 0.0118 | 0.000914 | 0.02304 | -0.009550 | -0.022300 | -0.02167 | -0.01366 | -0.005627 | -0.009690 | 0.001454 | -0.008240 | -0.008900 | 0.005936 | -0.004475 | -0.000716 | -0.003060 | 0.008514 | -0.017840 | 0.000079 | 0.01335 | -0.005287 | -0.000648 | 0.007446 |
5 rows × 533 columns
Build and train the Classifier
%%time
# Define the model
rfc = RandomForestClassifier(random_state=0, n_jobs = -1)
# rfc = ExtraTreesClassifier(random_state=0, n_jobs = -1)
# rfc = AdaBoostClassifier(random_state=0)
# rfc = GradientBoostingClassifier(random_state=0)
# Train the model
rfc.fit(X_train_imputed, y_train)
rfc
Predicting on test data
# Impute X_Test before predicting
X_test_imputed = imputer.transform(X_test)
# Prediction
y_pred_rfc = rfc.predict(X_test_imputed)
y_prob_pred_rfc = rfc.predict_proba(X_test_imputed)[:, 1]
print("Y predicted : ",y_pred_rfc)
print("Y probability predicted : ",y_prob_pred_rfc[:5])
Y predicted : [False False False ... False False False] Y probability predicted : [0.01302948 0.03184159 0.0321819 0.01066897 0.00670777]
Evaluation metrics
# Compute Evaluation Metric
compute_evaluation_metric(rfc, X_test_imputed, y_test, y_pred_rfc, y_prob_pred_rfc)
Accuracy Score : 0.9736173671554849
AUC Score : 0.8775531354407143
Confusion Matrix :
[[170697 344]
[ 4330 1791]]
Classification Report :
precision recall f1-score support
False 0.98 1.00 0.99 171041
True 0.84 0.29 0.43 6121
accuracy 0.97 177162
macro avg 0.91 0.65 0.71 177162
weighted avg 0.97 0.97 0.97 177162
Concordance Index : 0.8775057827680287
ROC curve :
PR curve :
# Concordance
concordance(y_test.values, y_prob_pred_rfc)
0.8775057827680287
# Gains Table and Capture rates plot
captures(y_test, y_pred_rfc, y_prob_pred_rfc)
| prob_bin | not_fraud | fraud | perc_fraud | perc_not_fraud | cum_perc_fraud | cum_perc_not_fraud | |
|---|---|---|---|---|---|---|---|
| 0 | Bin1((0.0532, 0.999]) | 13669 | 4048 | 0.661330 | 0.079917 | 66.132985 | 7.991651 |
| 1 | Bin2((0.0311, 0.0532]) | 17001 | 715 | 0.116811 | 0.099397 | 77.814083 | 17.931373 |
| 2 | Bin3((0.0213, 0.0311]) | 17318 | 398 | 0.065022 | 0.101251 | 84.316288 | 28.056431 |
| 3 | Bin4((0.016, 0.0213]) | 17441 | 275 | 0.044927 | 0.101970 | 88.809018 | 38.253401 |
| 4 | Bin5((0.013, 0.016]) | 17504 | 182 | 0.029734 | 0.102338 | 91.782388 | 48.487205 |
| 5 | Bin6((0.0114, 0.013]) | 17574 | 172 | 0.028100 | 0.102747 | 94.592387 | 58.761934 |
| 6 | Bin7((0.0097, 0.0114]) | 17552 | 109 | 0.017808 | 0.102619 | 96.373142 | 69.023801 |
| 7 | Bin8((0.00801, 0.0097]) | 17682 | 88 | 0.014377 | 0.103379 | 97.810815 | 79.361674 |
| 8 | Bin9((0.0068, 0.00801]) | 17636 | 81 | 0.013233 | 0.103110 | 99.134128 | 89.672652 |
| 9 | Bin10((0.00179, 0.0068]) | 17664 | 53 | 0.008659 | 0.103273 | 100.000000 | 100.000000 |
draw_calibration_curve(y_test, y_prob_pred_rfc, n_bins=10)
Imbalanced classes are a common problem in machine learning classification where there are a disproportionate ratio of observations in each class.
Most machine learning algorithms work best when the number of samples in each class are about equal. This is because most algorithms are designed to maximize accuracy and reduce error.
# random over sampler
ros = RandomOverSampler()
X_train_ros, y_train_ros = ros.fit_sample(X_train_imputed, y_train)
y_train_ros.value_counts()
True 398836 False 398836 Name: isFraud, dtype: int64
%%time
# Define the model
lgbc_ros = LGBMClassifier(random_state=0)
# Train the model
lgbc_ros.fit(X_train_ros,y_train_ros)
lgbc_ros
Wall time: 3min 20s
LGBMClassifier(random_state=0)
Let's use the model to get predictions on test dataset. We would be looking at the predicted class and predicted probability both in order to evaluate the performance of the model
# Prediction on the original test dataset
y_pred_lgbcros = lgbc_ros.predict(X_test_imputed)
y_prob_pred_lgbcros = lgbc_ros.predict_proba(X_test_imputed)[:, 1]
print("Y predicted : ",y_pred_lgbcros)
print("Y probability predicted : ",y_prob_pred_lgbcros[:5])
Y predicted : [False False False ... False False True] Y probability predicted : [0.04800962 0.30784135 0.30226978 0.104903 0.03731646]
Let's compute various evaluation metrices now
# Compute Evaluation Metric
compute_evaluation_metric(lgbc_ros, X_test_imputed, y_test, y_pred_lgbcros, y_prob_pred_lgbcros)
Accuracy Score : 0.8851503144015083
AUC Score : 0.9252961635759673
Confusion Matrix :
[[151857 19184]
[ 1163 4958]]
Classification Report :
precision recall f1-score support
False 0.99 0.89 0.94 171041
True 0.21 0.81 0.33 6121
accuracy 0.89 177162
macro avg 0.60 0.85 0.63 177162
weighted avg 0.97 0.89 0.92 177162
Concordance Index : 0.9252961416072233
ROC curve :
PR curve :
# Gains Table and Capture rates plot
captures(y_test, y_pred_lgbcros, y_prob_pred_lgbcros)
| prob_bin | not_fraud | fraud | perc_fraud | perc_not_fraud | cum_perc_fraud | cum_perc_not_fraud | |
|---|---|---|---|---|---|---|---|
| 0 | Bin1((0.587, 0.998]) | 13072 | 4645 | 0.758863 | 0.076426 | 75.886293 | 7.642612 |
| 1 | Bin2((0.386, 0.587]) | 17080 | 636 | 0.103905 | 0.099859 | 86.276752 | 17.628522 |
| 2 | Bin3((0.269, 0.386]) | 17398 | 318 | 0.051952 | 0.101718 | 91.471982 | 27.800352 |
| 3 | Bin4((0.204, 0.269]) | 17520 | 196 | 0.032021 | 0.102432 | 94.674073 | 38.043510 |
| 4 | Bin5((0.16, 0.204]) | 17593 | 123 | 0.020095 | 0.102858 | 96.683548 | 48.329348 |
| 5 | Bin6((0.126, 0.16]) | 17650 | 66 | 0.010783 | 0.103192 | 97.761804 | 58.648511 |
| 6 | Bin7((0.0985, 0.126]) | 17659 | 57 | 0.009312 | 0.103244 | 98.693024 | 68.972936 |
| 7 | Bin8((0.0746, 0.0985]) | 17681 | 35 | 0.005718 | 0.103373 | 99.264826 | 79.310224 |
| 8 | Bin9((0.0519, 0.0746]) | 17686 | 30 | 0.004901 | 0.103402 | 99.754942 | 89.650435 |
| 9 | Bin10((0.0035399999999999997, 0.0519]) | 17702 | 15 | 0.002451 | 0.103496 | 100.000000 | 100.000000 |
draw_calibration_curve(y_test, y_prob_pred_lgbcros, n_bins=10)
The 'balanced' mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
%%time
# Define the model
lgbc_bal = LGBMClassifier(random_state=0, class_weight='balanced')
# Train the model
lgbc_bal.fit(X_train_imputed, y_train)
lgbc_bal
Wall time: 1min 30s
LGBMClassifier(class_weight='balanced', random_state=0)
# Prediction
y_pred_lgbcbal = lgbc_bal.predict(X_test)
y_prob_pred_lgbcbal = lgbc_bal.predict_proba(X_test)[:, 1]
print("Y predicted : ",y_pred_lgbcbal)
print("Y probability predicted : ",y_prob_pred_lgbcbal[:5])
Y predicted : [False False False ... False False True] Y probability predicted : [0.17759182 0.43730851 0.42168771 0.41189862 0.17474149]
# Compute Evaluation Metric
compute_evaluation_metric(lgbc_bal, X_test, y_test, y_pred_lgbcbal, y_prob_pred_lgbcbal)
Accuracy Score : 0.7243144692428399
AUC Score : 0.9084094113408069
Confusion Matrix :
[[122883 48158]
[ 683 5438]]
Classification Report :
precision recall f1-score support
False 0.99 0.72 0.83 171041
True 0.10 0.89 0.18 6121
accuracy 0.72 177162
macro avg 0.55 0.80 0.51 177162
weighted avg 0.96 0.72 0.81 177162
Concordance Index : 0.9084093946254581
ROC curve :
PR curve :
# Gains Table and Capture rates plot
captures(y_test, y_pred_lgbcbal, y_prob_pred_lgbcbal)
| prob_bin | not_fraud | fraud | perc_fraud | perc_not_fraud | cum_perc_fraud | cum_perc_not_fraud | |
|---|---|---|---|---|---|---|---|
| 0 | Bin1((0.71, 0.998]) | 13398 | 4319 | 0.705604 | 0.078332 | 70.560366 | 7.833210 |
| 1 | Bin2((0.592, 0.71]) | 16984 | 732 | 0.119588 | 0.099298 | 82.519196 | 17.762992 |
| 2 | Bin3((0.502, 0.592]) | 17337 | 379 | 0.061918 | 0.101362 | 88.710995 | 27.899159 |
| 3 | Bin4((0.42, 0.502]) | 17452 | 263 | 0.042967 | 0.102034 | 93.007678 | 38.102560 |
| 4 | Bin5((0.345, 0.42]) | 17566 | 151 | 0.024669 | 0.102701 | 95.474596 | 48.372612 |
| 5 | Bin6((0.276, 0.345]) | 17603 | 113 | 0.018461 | 0.102917 | 97.320699 | 58.664297 |
| 6 | Bin7((0.212, 0.276]) | 17643 | 73 | 0.011926 | 0.103151 | 98.513315 | 68.979368 |
| 7 | Bin8((0.155, 0.212]) | 17674 | 42 | 0.006862 | 0.103332 | 99.199477 | 79.312562 |
| 8 | Bin9((0.102, 0.155]) | 17680 | 36 | 0.005881 | 0.103367 | 99.787616 | 89.649265 |
| 9 | Bin10((0.00214, 0.102]) | 17704 | 13 | 0.002124 | 0.103507 | 100.000000 | 100.000000 |
draw_calibration_curve(y_test, y_prob_pred_lgbcbal, n_bins=10)
from sklearn.calibration import CalibratedClassifierCV
lgbc_bal = LGBMClassifier(random_state=0)
calibrated_clf = CalibratedClassifierCV(base_estimator=lgbc_bal, cv=3, method='sigmoid')
calibrated_clf.fit(X_train_imputed, y_train)
CalibratedClassifierCV(base_estimator=LGBMClassifier(random_state=0), cv=3)
# Prediction
y_pred_calib = calibrated_clf.predict(X_test)
y_prob_pred_calib = calibrated_clf.predict_proba(X_test)[:, 1]
len(calibrated_clf.calibrated_classifiers_)
3
print("Y predicted : ", y_pred_calib)
print("Y probability predicted : ", y_prob_pred_calib[:5])
Y predicted : [False False False ... False False True] Y probability predicted : [0.01690884 0.01522014 0.017496 0.01598323 0.01401708]
# Compute Evaluation Metric
compute_evaluation_metric(calibrated_clf, X_test_imputed, y_test, y_pred_calib, y_prob_pred_calib)
Accuracy Score : 0.976259017170725
AUC Score : 0.9181559291804907
Confusion Matrix :
[[170242 799]
[ 3407 2714]]
Classification Report :
precision recall f1-score support
False 0.98 1.00 0.99 171041
True 0.77 0.44 0.56 6121
accuracy 0.98 177162
macro avg 0.88 0.72 0.78 177162
weighted avg 0.97 0.98 0.97 177162
Concordance Index : 0.9181559244046767
ROC curve :
PR curve :
# Gains Table and Capture rates plot
captures(y_test, y_pred_calib, y_prob_pred_calib)
| prob_bin | not_fraud | fraud | perc_fraud | perc_not_fraud | cum_perc_fraud | cum_perc_not_fraud | |
|---|---|---|---|---|---|---|---|
| 0 | Bin1((0.0513, 0.998]) | 13138 | 4579 | 0.748080 | 0.076812 | 74.808038 | 7.681199 |
| 1 | Bin2((0.0278, 0.0513]) | 17093 | 623 | 0.101781 | 0.099935 | 84.986113 | 17.674710 |
| 2 | Bin3((0.0217, 0.0278]) | 17408 | 308 | 0.050319 | 0.101777 | 90.017971 | 27.852386 |
| 3 | Bin4((0.0189, 0.0217]) | 17490 | 226 | 0.036922 | 0.102256 | 93.710178 | 38.078005 |
| 4 | Bin5((0.0172, 0.0189]) | 17598 | 118 | 0.019278 | 0.102888 | 95.637968 | 48.366766 |
| 5 | Bin6((0.016, 0.0172]) | 17629 | 87 | 0.014213 | 0.103069 | 97.059304 | 58.673651 |
| 6 | Bin7((0.0152, 0.016]) | 17649 | 67 | 0.010946 | 0.103186 | 98.153896 | 68.992230 |
| 7 | Bin8((0.0145, 0.0152]) | 17664 | 52 | 0.008495 | 0.103273 | 99.003431 | 79.319578 |
| 8 | Bin9((0.0138, 0.0145]) | 17678 | 38 | 0.006208 | 0.103355 | 99.624244 | 89.655112 |
| 9 | Bin10((0.011800000000000001, 0.0138]) | 17694 | 23 | 0.003758 | 0.103449 | 100.000000 | 100.000000 |
draw_calibration_curve(y_test, y_prob_pred_calib, n_bins=10)
Hyperparameter is a parameter that governs how the algorithm trains to learn the relationships. The values are set before the learning process begins.
Hyperparameter tuning refers to the automatic optimization of the hyper-parameters of a ML model.
%%time
# Define the estimator
lgbmclassifier = LGBMClassifier(random_state=0)
# Define the parameters gird
param_grid = {
'n_estimator' : [100,200], # default: 100
'num_leaves' : [256,128], # default: 256
'max_depth' : [5, 8], # default: 8
'learning_rate' : [0.05, 0.1], # default: .1
'reg_alpha' : [0 .1, 0.5], # default: .5
'class_weight' : ['balanced', None],
}
# run grid search
grid = GridSearchCV(lgbmclassifier, param_grid=param_grid, refit = True, verbose = 3, n_jobs=-1,cv = 3)
# fit the model for grid search
grid.fit(X_train, y_train)
Fitting 3 folds for each of 64 candidates, totalling 192 fits Wall time: 2h 27min 26s
GridSearchCV(cv=3, estimator=LGBMClassifier(random_state=0), n_jobs=-1,
param_grid={'class_weight': ['balanced', None],
'learning_rate': [0.05, 0.1], 'max_depth': [5, 8],
'n_estimator': [100, 200], 'num_leaves': [256, 128],
'reg_alpha': [0.1, 0.5]},
verbose=3)
Get the best parameters corresponding to which you have best model
# Best parameter after hyper parameter tuning
print(grid.best_params_)
# Moel Parameters
print(grid.best_estimator_)
lgbmclassifier = grid.best_estimator_
{'class_weight': None, 'learning_rate': 0.1, 'max_depth': 8, 'n_estimator': 100, 'num_leaves': 256, 'reg_alpha': 0.5}
LGBMClassifier(max_depth=8, n_estimator=100, num_leaves=256, random_state=0,
reg_alpha=0.5)
Let's use the best model to get predictions on test dataset. We would be looking at the predicted class and predicted probability both in order to evaluate the performance of the model
# Prediction using best parameters
y_grid_pred = lgbmclassifier.predict(X_test)
y_prob_grid_pred = lgbmclassifier.predict_proba(X_test)[:, 1]
print("Y predicted : ",y_grid_pred)
print("Y probability predicted : ",y_prob_grid_pred[:5])
Y predicted : [False False False ... False False False] Y probability predicted : [0.00210893 0.02470222 0.01490586 0.00864042 0.00098105]
Let's compute various evaluation metrices now
# Compute Evaluation Metric
compute_evaluation_metric(lgbmclassifier, X_test, y_test, y_grid_pred, y_prob_grid_pred)
Accuracy Score : 0.9797755726397309
AUC Score : 0.9409700338679997
Confusion Matrix :
[[170734 307]
[ 3276 2845]]
Classification Report :
precision recall f1-score support
False 0.98 1.00 0.99 171041
True 0.90 0.46 0.61 6121
accuracy 0.98 177162
macro avg 0.94 0.73 0.80 177162
weighted avg 0.98 0.98 0.98 177162
Concordance Index : 0.9409700247939532
ROC curve :
PR curve :
draw_calibration_curve(y_test, y_prob_grid_pred, n_bins=10)
# Calibrate
calibrated_clf = CalibratedClassifierCV(base_estimator=lgbmclassifier, cv=3)
calibrated_clf.fit(X_train, y_train)
y_pred_calib = calibrated_clf.predict(X_test)
y_prob_pred_calib = calibrated_clf.predict_proba(X_test)[:, 1]
draw_calibration_curve(y_test, y_prob_pred_calib, n_bins=10)
# Compute Evaluation Metric
compute_evaluation_metric(calibrated_clf, X_test, y_test, y_pred_calib, y_prob_pred_calib)
Accuracy Score : 0.9801763357830686
AUC Score : 0.9425583129340254
Confusion Matrix :
[[170598 443]
[ 3069 3052]]
Classification Report :
precision recall f1-score support
False 0.98 1.00 0.99 171041
True 0.87 0.50 0.63 6121
accuracy 0.98 177162
macro avg 0.93 0.75 0.81 177162
weighted avg 0.98 0.98 0.98 177162
Concordance Index : 0.9425583124564438
ROC curve :
PR curve :
# Gains Table and Capture rates plot
captures(y_test, y_pred_calib, y_prob_pred_calib)
| prob_bin | not_fraud | fraud | perc_fraud | perc_not_fraud | cum_perc_fraud | cum_perc_not_fraud | |
|---|---|---|---|---|---|---|---|
| 0 | Bin1((0.0194, 0.999]) | 12683 | 5034 | 0.822415 | 0.074152 | 82.241464 | 7.415181 |
| 1 | Bin2((0.0146, 0.0194]) | 17276 | 440 | 0.071884 | 0.101005 | 89.429832 | 17.515683 |
| 2 | Bin3((0.0134, 0.0146]) | 17484 | 232 | 0.037902 | 0.102221 | 93.220062 | 27.737794 |
| 3 | Bin4((0.0128, 0.0134]) | 17576 | 140 | 0.022872 | 0.102759 | 95.507270 | 38.013693 |
| 4 | Bin5((0.0125, 0.0128]) | 17620 | 96 | 0.015684 | 0.103016 | 97.075641 | 48.315316 |
| 5 | Bin6((0.0122, 0.0125]) | 17654 | 62 | 0.010129 | 0.103215 | 98.088548 | 58.636818 |
| 6 | Bin7((0.012, 0.0122]) | 17675 | 41 | 0.006698 | 0.103338 | 98.758373 | 68.970598 |
| 7 | Bin8((0.0119, 0.012]) | 17681 | 35 | 0.005718 | 0.103373 | 99.330175 | 79.307885 |
| 8 | Bin9((0.0117, 0.0119]) | 17686 | 30 | 0.004901 | 0.103402 | 99.820291 | 89.648096 |
| 9 | Bin10((0.010499999999999999, 0.0117]) | 17706 | 11 | 0.001797 | 0.103519 | 100.000000 | 100.000000 |
Hence we can freeze the model.
Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable.
lgbmclassifier = grid.best_estimator_
lgbmclassifier.feature_importances_
array([189, 60, 657, 568, 106, 58, 251, 89, 488, 10, 261, 250, 119,
222, 197, 17, 31, 63, 146, 28, 65, 103, 62, 171, 62, 298,
130, 193, 267, 144, 232, 96, 43, 266, 95, 195, 123, 39, 60,
94, 286, 0, 20, 29, 66, 61, 58, 8, 8, 16, 42, 81,
2, 0, 38, 31, 8, 1, 10, 2, 35, 14, 2, 1, 11,
86, 64, 0, 0, 8, 9, 4, 33, 0, 0, 2, 1, 2,
0, 33, 0, 0, 0, 0, 0, 0, 0, 0, 22, 15, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
94, 172, 343, 52, 8, 232, 189, 344, 167, 188, 161, 135, 118,
114, 177, 100, 242, 188, 178, 169, 146, 185, 186, 199, 161, 167,
167, 136, 155, 141, 194, 184, 125, 110, 149, 186, 194, 192, 200])
feature_importance_df = pd.DataFrame({'feature' : X_train.columns, 'importance' : lgbmclassifier.feature_importances_ })
feature_importance_df = feature_importance_df.sort_values(by="importance", ascending=False)
feature_importance_df = feature_importance_df.iloc[:30,:]
feature_importance_df
| feature | importance | |
|---|---|---|
| 2 | card1 | 657 |
| 3 | card2 | 568 |
| 8 | addr1 | 488 |
| 501 | TransactionAmt_to_std_card1 | 344 |
| 496 | _Days | 343 |
| 25 | C13 | 298 |
| 40 | D15 | 286 |
| 28 | D2 | 267 |
| 33 | D8 | 266 |
| 10 | dist1 | 261 |
| 6 | card5 | 251 |
| 11 | P_emaildomain | 250 |
| 510 | PCA_V_7 | 242 |
| 499 | TransactionAmt_to_mean_card1 | 232 |
| 30 | D4 | 232 |
| 13 | C1 | 222 |
| 532 | PCA_V_29 | 200 |
| 517 | PCA_V_14 | 199 |
| 14 | C2 | 197 |
| 35 | D10 | 195 |
| 524 | PCA_V_21 | 194 |
| 530 | PCA_V_27 | 194 |
| 27 | D1 | 193 |
| 531 | PCA_V_28 | 192 |
| 500 | TransactionAmt_to_mean_card4 | 189 |
| 0 | TransactionAmt | 189 |
| 503 | PCA_V_0 | 188 |
| 511 | PCA_V_8 | 188 |
| 516 | PCA_V_13 | 186 |
| 529 | PCA_V_26 | 186 |
plt.figure(figsize=(16, 12));
sns.barplot(x="importance", y="feature", data=feature_importance_df.sort_values(by="importance", ascending=False));
plt.title('LGB Features');
## pdp plots
from sklearn.inspection import partial_dependence, plot_partial_dependence
from sklearn.utils import validation
Fit the model
lgbmclassifier.fit(X_train, y_train)
lgbmclassifier.dummy_ = "dummy"
validation.check_is_fitted(estimator=lgbmclassifier)
Plot Partial Dependence
fig = plt.figure(figsize=(16, 12))
plot_partial_dependence(lgbmclassifier, X, ['card2'])
plt.show()
<Figure size 1152x864 with 0 Axes>
Individual Conditional Expectation (ICE) Plot - card2
plot_partial_dependence(lgbmclassifier, X, ['card2'], kind='both')
<sklearn.inspection._plot.partial_dependence.PartialDependenceDisplay at 0x11225b5bd68>
Partial Dependence and ICE Plot - C13
fig = plt.figure(figsize=(16, 12))
plot_partial_dependence(lgbmclassifier, X, ['C13'], kind='both')
plt.show()
<Figure size 1152x864 with 0 Axes>
fig = plt.figure(figsize=(16, 12))
plot_partial_dependence(lgbmclassifier, X, ['C13'])
plt.show()
<Figure size 1152x864 with 0 Axes>
SHAP values is used to reverse engineer the output of the prediction model and quantify the contribution of each predictor for a given prediction.
import shap
shap_model = shap.TreeExplainer(lgbmclassifier)
shap_values = shap_model.shap_values(X_train)
You can make a partial dependence plot using shap.dependence_plot. This shows the relationship between the feature and the Y. This also automatically includes another feature that your feature interacts frequently with.
# card2
shap.dependence_plot("card2", shap_values[0], X_train)
# card3
shap.dependence_plot("card3", shap_values[0], X_train)
Explain a single observation.
shap.initjs() # needed to show viz
shap.force_plot(shap_model.expected_value[1], shap_values[1][14], X_train.iloc[14, :])
Add link = "logit"
shap.initjs() # needed to show viz
shap.force_plot(shap_model.expected_value[1], shap_values[1][14], X_train.iloc[14, :], link='logit')
y_pred_calib_tr[14]
array([0.89570106, 0.10429894])
# compute SHAP values
explainer = shap.Explainer(lgbmclassifier, X_train) # , link=shap.links.logit)
shap_values_waterfall = explainer(X_train[:100])
# visualize the first prediction's explanation
shap.plots.waterfall(shap_values_waterfall[0])
The model has been trained and tested, so now one can use it to predict if any transaction would be fraud or not.